Plotting¶

Volcano Plots¶

kompot.plot.volcano_de(adata: AnnData, lfc_key: str = None, score_key: str = None, condition1: str | None = None, condition2: str | None = None, n_top_genes: int | None = None, highlight_genes: List[str] | Dict[str, str] | List[Dict[str, Any]] | None = None, color: str | None = None, background_cmap: str | Colormap = None, color_discrete_map: Dict[str, str] | None = None, vmin: float | str | None = None, vmax: float | str | None = None, vcenter: float | None = None, gene_labels: bool | int | List[str] | Dict[str, str] = 10, figsize: Tuple[float, float] = (10, 8), title: str | None = None, xlabel: str | None = 'Log Fold Change', ylabel: str | None = None, n_x_ticks: int = 3, n_y_ticks: int = 3, color_up: str = '#d73027', color_down: str = '#4575b4', color_background: str = '#c0c0c0', alpha_background: float = 1.0, point_size: float = 5, font_size: float = 9, text_offset: Tuple[float, float] = (2, 2), text_kwargs: Dict[str, Any] | None = None, grid: bool = True, grid_kwargs: Dict[str, Any] | None = None, ax: Axes | None = None, legend_loc: str = 'best', legend_fontsize: float | None = None, legend_title_fontsize: float | None = None, show_legend: bool = True, sort_key: str | None = None, return_fig: bool = False, save: str | None = None, run_id: int = -1, legend_ncol: int | None = None, group: str | None = None, y_axis_type: str = 'mahalanobis', significance_threshold: float | Dict[str, float] | None = None, update_de_classification: bool = False, direction_column: str | None = None, show_thresholds: bool = True, **kwargs) → Figure | NoneView on GitHub ¶

Create a volcano plot from Kompot differential expression results.

Parameters:

adata (AnnData) – AnnData object containing differential expression results in .var
lfc_key (str, optional) – Key in adata.var for log fold change values. If None, will try to infer from kompot_de_ keys.
score_key (str, optional) – Key in adata.var for significance scores. Default is "kompot_de_mahalanobis"
condition1 (str, optional) – Name of condition 1 (negative log fold change)
condition2 (str, optional) – Name of condition 2 (positive log fold change)
n_top_genes (int, optional) – If specified, highlight this number of top genes by score instead of using DE classification. Cannot be used together with significance_threshold. If not specified (None), will use DE classification from is_de column when available. Ignored if highlight_genes is provided.
highlight_genes (list of str, dict of {str: str}, or list of dict, optional) – Genes to highlight. Can be a list of gene names, a dict mapping gene names to colors, or a list of dicts with keys 'genes' (required), 'name' (optional), and 'color' (optional). If provided, overrides n_top_genes.
color (str, optional) – Key in adata.var to use for coloring background genes. Can be continuous or categorical.
background_cmap (str or Colormap, optional) – Colormap to use for background coloring. Default is for continuous ‘Spectral_r’.
color_discrete_map (dict, optional) – Mapping of category values to colors for categorical color. If not provided, colors will be selected from the colormap.
vmin (float or str, optional) – Minimum value for colormap normalization. If a string starting with ‘p’ followed by a number, uses that percentile (e.g., ‘p5’ for 5th percentile).
vmax (float or str, optional) – Maximum value for colormap normalization. If a string starting with ‘p’ followed by a number, uses that percentile (e.g., ‘p95’ for 95th percentile).
vcenter (float, optional) – Center value for diverging colormaps. If provided with vmin/vmax, ensures proper ordering.
gene_labels (bool, int, list of str, or dict, optional) – Controls which genes get labeled with their names: - True: label all highlighted genes - False: label no genes - int: label top N genes by score (default: 10) - list of str: label specific genes by name - dict: label genes with custom labels (gene_name -> custom_label)
figsize (tuple, optional) – Figure size as (width, height) in inches
title (str, optional) – Plot title. If None and conditions provided, uses “{condition2} vs {condition1}”
xlabel (str, optional) – Label for x-axis
ylabel (str, optional) – Label for y-axis
n_x_ticks (int, optional) – Number of ticks to display on the x-axis (default: 3)
n_y_ticks (int, optional) – Number of ticks to display on the y-axis (default: 3)
color_up (str, optional) – Color for up-regulated genes
color_down (str, optional) – Color for down-regulated genes
color_background (str, optional) – Color for background genes when not using color
alpha_background (float, optional) – Alpha value for background genes (default: 1.0)
point_size (float, optional) – Size of points for background genes
font_size (float, optional) – Font size for gene labels
text_offset (tuple, optional) – Offset (x, y) in points for gene labels from their points
text_kwargs (dict, optional) – Additional parameters for text labels
grid (bool, optional) – Whether to show grid lines
grid_kwargs (dict, optional) – Additional parameters for grid
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If None, creates new figure
legend_loc (str, optional) – Location for the legend (‘best’, ‘upper right’, ‘lower left’, etc., or ‘none’ to hide)
legend_fontsize (float, optional) – Font size for the legend text. If None, uses matplotlib defaults.
legend_title_fontsize (float, optional) – Font size for the legend title. If None, uses matplotlib defaults.
show_legend (bool, optional) – Whether to show the legend (default: True)
legend_ncol (int, optional) – Number of columns in the legend. If None, automatically determined.
sort_key (str, optional) – Key to sort genes by. If None, sorts by score_key
return_fig (bool, optional) – If True, returns the figure and axes
save (str, optional) – Path to save figure. If None, figure is not saved
run_id (int, optional) – Specific run ID to use for fetching field names from run history. Negative indices count from the end (-1 is the latest run). If None, uses the latest run information.
group (str, optional) – If provided, use data for a specific group/subset analyzed with the ‘groups’ parameter in compute_differential_expression. Will use the values from adata.varm instead of adata.var for Mahalanobis distances, and mean fold changes.
y_axis_type (str, optional) – Type of values to use for the y-axis: “mahalanobis” (default), “local_fdr”, “tail_fdr”, “ptp”, or a custom column name from adata.var. FDR values are -log10 transformed for display; the “ptp” column is already stored as -log10(PTP) (the neg_log10_ptp field) and is plotted directly. In both cases higher on the axis means more significant.
significance_threshold (float or dict, optional) – Significance threshold for the y-axis values. A float sets a single threshold shown as a horizontal line. A dict maps y-axis types to thresholds (e.g., {"local_fdr": 0.05, "ptp": 0.01}); cells must pass all thresholds, and no threshold line is drawn. For "mahalanobis" this is a minimum distance; for "local_fdr", "tail_fdr", and "ptp" it is a maximum value.
update_de_classification (bool, optional) – Whether to update the differential expression classification column based on the new significance threshold. Applicable for FDR and ptp y_axis_types (default: False).
direction_column (str, optional) – Name of the differential expression boolean column to update if update_de_classification=True. If None, tries to infer from the score_key.
show_thresholds (bool, optional) – Whether to show threshold lines on the plot (default: True).
**kwargs – Additional parameters passed to plt.scatter

Return type:

If return_fig is True, returns (fig, ax)

kompot.plot.volcano_da(adata: AnnData, lfc_key: str | None = None, ptp_key: str | None = None, group_key: str | None = None, log_transform_ptp: bool = True, lfc_threshold: float | None = None, ptp_threshold: float | None = None, color: str | List[str] | None = None, alpha_background: float = 1.0, highlight_subset: ndarray | List[bool] | None = None, highlight_color: str = '#d73027', figsize: Tuple[float, float] = (10, 8), title: str | None = 'Differential Abundance Volcano Plot', xlabel: str | None = 'Log Fold Change', ylabel: str | None = '-log10(ptp)', n_x_ticks: int = 3, n_y_ticks: int = 3, legend_loc: str = 'best', legend_fontsize: float | None = None, legend_title_fontsize: float | None = None, show_legend: bool = True, grid: bool = True, grid_kwargs: Dict[str, Any] | None = None, ax: Axes | None = None, palette: str | List[str] | Dict[str, str] | None = None, save: str | None = None, return_fig: bool = False, run_id: int = -1, legend_ncol: int | None = None, update_direction: bool = False, direction_column: str | None = None, show_thresholds: bool = True, show_colorbar: bool = True, cmap: str | Colormap | None = None, vcenter: float | None = None, vmin: float | None = None, vmax: float | None = None, **kwargs) → Figure | NoneView on GitHub ¶

Create a volcano plot for differential abundance results.

This function visualizes cells in a 2D volcano plot with log fold change on the x-axis and significance (-log10 PTP (Posterior Tail Probability)) on the y-axis. Cells can be colored by any column in adata.obs.

Parameters:

adata (AnnData) – AnnData object containing differential abundance results
lfc_key (str, optional) – Key in adata.obs for log fold change values. If None, will try to infer from kompot_da_ keys.
ptp_key (str, optional) – Key in adata.obs for PTPs (Posterior Tail Probabilities). Posterior Tail Probability is a significance measure score similar to p-value. If None, will try to infer from kompot_da_ keys.
group_key (str, optional) – Key in adata.obs to group cells by (for coloring)
log_transform_ptp (bool, optional) – Whether to -log10 transform PTPs (Posterior Tail Probabilities) for the y-axis
lfc_threshold (float, optional) – Log fold change threshold for significance (for drawing threshold lines)
ptp_threshold (float, optional) – PTP (Posterior Tail Probability) threshold for significance (for drawing threshold lines)
color (str or list of str, optional) – Keys in adata.obs for coloring cells. Requires scanpy.
alpha_background (float, optional) – Alpha value for background cells (below threshold). Default is 1.0 (no transparency)
highlight_subset (array or list, optional) – Boolean mask to highlight specific cells
highlight_color (str, optional) – Color for highlighted cells
figsize (tuple, optional) – Figure size as (width, height) in inches
title (str, optional) – Plot title
xlabel (str, optional) – Label for x-axis
ylabel (str, optional) – Label for y-axis
n_x_ticks (int, optional) – Number of ticks to display on the x-axis (default: 3)
n_y_ticks (int, optional) – Number of ticks to display on the y-axis (default: 3)
legend_loc (str, optional) – Location for the legend (‘best’, ‘upper right’, ‘lower left’, etc., or ‘none’ to hide)
legend_fontsize (float, optional) – Font size for the legend text. If None, uses matplotlib defaults.
legend_title_fontsize (float, optional) – Font size for the legend title. If None, uses matplotlib defaults.
show_legend (bool, optional) – Whether to show the legend (default: True)
grid (bool, optional) – Whether to show grid lines
grid_kwargs (dict, optional) – Additional parameters for grid
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If None, creates new figure
palette (str, list, or dict, optional) – Color palette to use for categorical coloring
legend_ncol (int, optional) – Number of columns in the legend. If None, automatically determined based on the number of categories.
save (str, optional) – Path to save figure. If None, figure is not saved
show (bool, optional) – Whether to show the plot
return_fig (bool, optional) – If True, returns the figure and axes
run_id (int, optional) – Specific run ID to use for fetching field names from run history. Negative indices count from the end (-1 is the latest run). If None, uses the latest run information.
update_direction (bool, optional) – Whether to update the direction column based on the provided thresholds before plotting (default: False)
direction_column (str, optional) – Direction column to update if update_direction=True. If None, infers from run_id.
show_thresholds (bool, optional) – Whether to display horizontal and vertical threshold lines (default: True). Set to False to hide threshold lines.
show_colorbar (bool, optional) – Whether to display colorbar for numeric color columns (default: True). Set to False to hide colorbar.
condition1 (str, optional) – Name of condition 1 (denominator in fold change)
condition2 (str, optional) – Name of condition 2 (numerator in fold change)
**kwargs – Additional parameters passed to plt.scatter

Return type:

If return_fig is True, returns (fig, ax)

kompot.plot.multi_volcano_da(adata: AnnData, groupby: str, lfc_key: str | None = None, ptp_key: str | None = None, log_transform_ptp: bool = True, lfc_threshold: float | None = None, ptp_threshold: float | None = None, color: str | List[str] | None = None, alpha_background: float = 1.0, highlight_subset: ndarray | List[bool] | None = None, highlight_color: str = '#d73027', figsize: Tuple[float, float] | None = None, title: str | None = 'Differential Abundance Volcano Plot', xlabel: str | None = None, ylabel: str | None = '-log10(PTP (Posterior Tail Probability))', n_x_ticks: int = 3, n_y_ticks: int = 0, legend_loc: str = 'bottom', legend_fontsize: float | None = None, legend_title_fontsize: float | None = None, show_legend: bool | None = None, grid: bool = True, grid_kwargs: Dict[str, Any] | None = None, palette: str | List[str] | Dict[str, str] | None = None, show_thresholds: bool = False, plot_width_factor: float = 10.0, share_y: bool = True, layout_config: Dict[str, float] | None = None, background_plot: Literal['kde', 'violin'] | None = None, background_alpha: float = 0.5, background_color: str = '#E6E6E6', background_edgecolor: str = '#808080', background_height_factor: float = 0.6, background_kwargs: Dict[str, Any] | None = None, save: str | None = None, return_fig: bool = False, run_id: int = -1, update_direction: bool = False, direction_column: str | None = None, cmap: str | Colormap | None = None, vcenter: float | None = None, vmin: float | None = None, vmax: float | None = None, **kwargs) → Figure | NoneView on GitHub ¶

Create multiple volcano plots for differential abundance results, one per group.

This function creates a panel of volcano plots, one for each unique value in the groupby column. Each plot is wider than tall (by default 10x wider than tall) and is aligned with other plots. Only the bottom plot shows x-axis labels and ticks, only the middle plot shows the y-axis label, and y-axis ticks are hidden for all plots. Group labels are placed to the right of each plot, aligned with the plot edge. Each plot has a box outline by default, and points are drawn with full opacity (no transparency). If the color and groupby columns are identical, the legend is hidden. Vertical lines (both threshold and center line at 0) are hidden by default but can be enabled with show_thresholds=True.

Parameters:

adata (AnnData) – AnnData object containing differential abundance results
groupby (str) – Column in adata.obs to group cells by (for separating into multiple plots)
lfc_key (str, optional) – Key in adata.obs for log fold change values. If None, will try to infer from kompot_da_ keys.
ptp_key (str, optional) – Key in adata.obs for PTPs (Posterior Tail Probabilities). Posterior Tail Probability is a significance measure score similar to p-value. If None, will try to infer from kompot_da_ keys.
log_transform_ptp (bool, optional) – Whether to -log10 transform PTPs (Posterior Tail Probabilities) for the y-axis
lfc_threshold (float, optional) – Log fold change threshold for significance (for drawing threshold lines)
ptp_threshold (float, optional) – PTP (Posterior Tail Probability) threshold for significance (for drawing threshold lines)
color (str or list of str, optional) – Keys in adata.obs for coloring cells. Requires scanpy. If identical to groupby, the legend will be hidden.
alpha_background (float, optional) – Alpha value for background cells (below threshold). Default is 1.0 (no transparency)
highlight_subset (array or list, optional) – Boolean mask to highlight specific cells
highlight_color (str, optional) – Color for highlighted cells
figsize (tuple, optional) – Figure size as (width, height) in inches. If None, it will be calculated automatically based on the number of groups and layout parameters.
title (str, optional) – Plot title
xlabel (str, optional) – Label for x-axis (only shown on bottom plot). If None, it will be automatically generated based on condition names extracted from lfc_key if available.
ylabel (str, optional) – Label for y-axis (only shown on middle plot)
n_x_ticks (int, optional) – Number of ticks to display on the x-axis (default: 3)
n_y_ticks (int, optional) – Number of ticks to display on the y-axis (default: 0, no y-ticks)
legend_loc (str, optional) – Location for the legend (‘bottom’, ‘right’, ‘best’, ‘upper right’, etc.)
legend_fontsize (float, optional) – Font size for the legend text
legend_title_fontsize (float, optional) – Font size for the legend title
show_legend (bool, optional) – Whether to show the legend. If None (default), legend will be shown except when color column is identical to groupby column. If explicitly set to True or False, this setting will override the automatic behavior.
grid (bool, optional) – Whether to show grid lines
grid_kwargs (dict, optional) – Additional parameters for grid
palette (str, list, or dict, optional) – Color palette to use for categorical coloring
show_thresholds (bool, optional) – Whether to display threshold lines on the plots (default: False)
show_colorbar (bool, optional) – Whether to display colorbars in individual volcano plots (default: False in multi_volcano_da)
plot_width_factor (float, optional) – Width factor for each volcano plot. Higher values make plots wider relative to their height. Default is 10.0 (plots are 10x wider than tall). This is maintained regardless of the number of groups.
share_y (bool, optional) – Whether to use the same y-axis limits for all plots (default: True)
layout_config (dict, optional) – Configuration for controlling plot layout spacing. Keys include: - ‘unit_size’: Base unit size in inches (default: 0.15) - ‘title_height’: Height for title area in units (default: 2) - ‘legend_bottom_margin’: Distance from bottom of figure to legend/colorbar in units (default: 3) - ‘legend_plot_gap’: Gap between last plot and legend/colorbar in units (default: 3) - ‘legend_height’: Minimum height for legend/colorbar area in units (default: 3) - ‘plot_height’: Height for each plot in units (default: 4) - ‘plot_width’: Width for each plot in units (default: plot_width_factor * plot_height) - ‘label_width’: Width for group labels in units (default: 4) - ‘top_margin’: Top margin in units (default: 1) - ‘plot_spacing’: Spacing between plots in units (default: 0.2) - ‘y_label_width’: Width for y-axis label in units (default: 2) - ‘y_label_offset’: Offset of y-axis label from plots in units (default: 0.5)
background_plot (str, optional) – Type of background density plot to display. Options are “kde” or “violin”. If None (default), no background density plot is shown.
background_alpha (float, optional) – Alpha (transparency) value for the background density plot (default: 0.5)
background_color (str, optional) – Color for the background density plot (default: “#E6E6E6”, light gray)
background_edgecolor (str, optional) – Color for the outline of the background density plot (default: “#808080”, medium gray)
background_height_factor (float, optional) – Controls the height of the background plot as a fraction of the y-axis range (default: 0.6). Higher values make the KDE/violin taller, lower values make it shorter.
background_kwargs (dict, optional) – Additional parameters for the background density plot. For KDE: "bw_method", "show_2d_kde", "contour_levels", "contour_cmap", "contour_alpha". For violin: "showmeans", "showmedians", "showextrema".
save (str, optional) – Path to save figure. If None, figure is not saved
show (bool, optional) – Whether to show the plot
return_fig (bool, optional) – If True, returns the figure and axes
run_id (int, optional) – Specific run ID to use for fetching field names from run history
update_direction (bool, optional) – Whether to update the direction column based on the provided thresholds before plotting (default: False). This is only applied once to the full dataset, not to individual group subsets.
direction_column (str, optional) – Direction column to update if update_direction=True. If None, infers from run_id.
cmap (str or matplotlib.cm.Colormap, optional) – Colormap to use for numeric color values. If not provided, automatically selects ‘RdBu_r’ with vcenter=0 for columns containing ‘log_fold_change’ or ‘lfc’, otherwise defaults to “Spectral_r”.
vcenter (float, optional) – Value to center the colormap at. Only applies to diverging colormaps. If not specified but a column containing ‘log_fold_change’ or ‘lfc’ is used for coloring, defaults to 0.
vmin (float, optional) – Minimum value for the colormap. If not provided, uses the minimum value in the data.
vmax (float, optional) – Maximum value for the colormap. If not provided, uses the maximum value in the data.
**kwargs – Additional parameters passed to plt.scatter

Return type:

If return_fig is True, returns (fig, axes_list)

Expression Plots¶

kompot.plot.plot_gene_expression(adata: AnnData, gene: str, lfc_key: str | None = None, score_key: str | None = None, condition1: str | None = None, condition2: str | None = None, basis: str | None = 'X_umap', figsize: Tuple[float, float] = (12, 12), cmap_expression: str = 'Spectral_r', cmap_fold_change: str = 'RdBu_r', title: str | None = None, run_id: int = -1, layer: str | None = None, save: str | None = None, return_fig: bool = False, **kwargs) → Figure | NoneView on GitHub ¶

Visualize expression patterns for a specific gene across conditions.

Creates a figure with multiple panels showing original expression, smoothed expression for each condition, and fold change.

Parameters:

adata (AnnData) – AnnData object containing differential expression results
gene (str) – Name of the gene to visualize
lfc_key (str, optional) – Key in adata.var for log fold change values. If None, will try to infer from kompot_de_ keys.
score_key (str, optional) – Key in adata.var for significance scores. If None, will try to infer from kompot_de_ keys.
condition1 (str, optional) – Name of condition 1 (denominator in fold change)
condition2 (str, optional) – Name of condition 2 (numerator in fold change)
basis (str or None, optional) – Key in adata.obsm for the embedding coordinates (default: “X_umap”). If None, will use cell index for x-axis instead of embeddings.
figsize (tuple, optional) – Figure size as (width, height) in inches
cmap_expression (str, optional) – Colormap for expression plots
cmap_fold_change (str, optional) – Colormap for fold change plot
title (str, optional) – Overall figure title. If None, uses gene name.
run_id (int, optional) – Run ID to use. Default is -1 (latest run).
layer (str, optional) – Layer in AnnData to use for expression values. If None, uses adata.X or infers from run information.
save (str, optional) – Path to save figure. If None, figure is not saved
return_fig (bool, optional) – If True, returns the figure and axes
**kwargs – Additional parameters passed to scatter plot functions

Return type:

If return_fig is True, returns (fig, axes)

Dotplots¶

kompot.plot.dotplot(adata: AnnData, genes: Sequence[str] | None, groupby: str, *, lfc_layer: str | None = None, expr_layer: str | None = None, score_key: str | None = None, filter_key: str | None = None, n_top: int = 15, categories_order: Sequence[str] | None = None, min_cells: int = 0, expr_threshold: float = 0.0, vabs_pct: float = 98.0, vabs_min: float = 0.0, vmax: float | None = None, cmap: str = 'RdBu_r', size_exponent: float = 1.5, dot_max: float = 60.0, dot_edge_color: str = 'white', dot_edge_lw: float = 0.2, axes: Tuple[Axes, Axes, Axes] | None = None, figsize: Tuple[float, float] = (7.5, 3.2), cbar_title: str = 'mean LFC', size_title: str = 'fraction\nexpressing', title: str | None = None, xlabel: str | None = None, ylabel: str | None = None, gene_label_fontsize: float = 6.5, category_label_fontsize: float = 5.5, italic_genes: bool = True, run_id: int = -1, return_fig: bool = False, save: str | None = None) → Figure | NoneView on GitHub ¶

Kompot fold-change dotplot across groups.

Each tile encodes two quantities for a (gene, group) pair:

color — mean of the per-cell LFC layer lfc_layer over cells in the group (i.e. the per-cell kompot fold-change averaged within the category). Symmetric diverging scale keyed on the vabs_pct-th percentile of |LFC| by default.
size — fraction of cells in the group whose expr_layer value exceeds expr_threshold (default 0).

Parameters:

adata (AnnData) – AnnData with kompot DE results.
genes (sequence of str or None) – Explicit gene list (in display order, top-first). If None, the top n_top genes are picked by the Mahalanobis column inferred from run history (or score_key if given).
groupby (str) – Column in adata.obs used for the column axis of the dotplot.
lfc_layer (str, optional) – Layer in adata.layers holding per-cell LFC (e.g. "kompot_de_<c1>_to_<c2>_fold_change"). If None, inferred from the latest kompot DE run.
expr_layer (str, optional) – Layer used to compute fraction-expressed. Defaults to adata.X.
score_key (str, optional) – Column in adata.var used to rank genes when genes is None. Defaults to the Mahalanobis column inferred from run history.
filter_key (str, optional) – Boolean column in adata.var restricting auto-pick candidates (e.g. "kompot_de_<c1>_to_<c2>_is_de").
n_top (int, default 15) – Number of genes selected when genes is None.
categories_order (sequence of str, optional) – Subset/order of groupby categories to display. Categories not present in adata.obs[groupby] are dropped.
min_cells (int, default 0) – Drop categories with fewer than this many cells.
expr_threshold (float, default 0.0) – Threshold used for the fraction-expressed calculation.
vabs_pct (float, default 98.0) – Percentile of |LFC| setting the symmetric color limits.
vabs_min (float, default 0.0) – Floor for the symmetric color limit (max(pct, floor)). Useful when most tiles are near-zero and the percentile rounds down.
vmax (float, optional) – Override the color limit. If provided, vabs_pct and vabs_min are ignored and the scale spans [-vmax, vmax].
cmap (str, default "RdBu_r") – Diverging colormap name.
size_exponent (float, default 1.5) – Exponent applied to the fraction-expressed before scaling to a scatter s value. Higher compresses low fractions.
dot_max (float, default 60.0) – Target scatter s (area in pt²) for a frac=1.0 dot.
dot_edge_color (str, default "white")
dot_edge_lw (float, default 0.2)
axes (3-tuple of matplotlib.axes.Axes, optional) – (main, cbar, size_legend). If None a standalone figure is built with the three axes laid out in a constrained grid.
figsize (tuple, default (7.5, 3.2)) – Size of the standalone figure; ignored when axes is given.
cbar_title (str) – Titles for the colorbar and size legend.
size_title (str) – Titles for the colorbar and size legend.
title (str, optional) – Main-axis annotations. Defaults leave these empty.
xlabel (str, optional) – Main-axis annotations. Defaults leave these empty.
ylabel (str, optional) – Main-axis annotations. Defaults leave these empty.
gene_label_fontsize (float) – Font sizes for the y-axis (genes) and x-axis (categories).
category_label_fontsize (float) – Font sizes for the y-axis (genes) and x-axis (categories).
italic_genes (bool, default True) – Italicize gene y-labels.
run_id (int, default -1) – Kompot DE run index used when inferring lfc_layer / score_key.
return_fig (bool, default False) – If True and axes is None, return the created Figure.
save (str, optional) – If given, fig.savefig(save, bbox_inches="tight") is called.

Returns:

Figure if axes is None and return_fig is True, otherwise None.

Return type:

matplotlib.figure.Figure or None

Examples

Standalone figure, explicit genes:

import kompot
kompot.plot.dotplot(
    adata,
    genes=["Hbb-bh1", "Hba-x", "Tal1"],
    groupby="celltype.mapped",
    lfc_layer="kompot_de_WT_to_Tal1_fold_change",
    expr_layer="logcounts",
    return_fig=True,
)

Auto-pick top-20 Mahalanobis hits restricted to is_de=True:

kompot.plot.dotplot(
    adata, genes=None, groupby="celltype.mapped",
    filter_key="kompot_de_WT_to_Tal1_is_de",
    n_top=20, return_fig=True,
)

Embed into a composite figure:

fig = plt.figure(figsize=(8, 4), layout="constrained")
gs = fig.add_gridspec(1, 2, width_ratios=[1.0, 0.14])
ax_main = fig.add_subplot(gs[0, 0])
inner = gs[0, 1].subgridspec(2, 1)
ax_size = fig.add_subplot(inner[0, 0])
ax_cbar = fig.add_subplot(inner[1, 0])
kompot.plot.dotplot(
    adata, genes=top_genes, groupby="celltype.mapped",
    lfc_layer="kompot_de_WT_to_Tal1_fold_change",
    axes=(ax_main, ax_cbar, ax_size),
)

Enrichment Lollipop¶

kompot.plot.lollipop(data: DataFrame | object, *, n_terms: int = 12, term_col: str | None = None, score_col: str | None = None, count_col: str | None = None, fdr_col: str | None = None, x_metric: str = 'neg_log10_fdr', sort_by: str | None = 'x', ascending: bool | None = None, category: str = 'Process', fdr_threshold: float = 0.05, color: str = '#d73027', edge_color: str | None = None, cmap: str | None = None, color_by: str | None = None, stem_lw: float = 1.8, stem_alpha: float = 0.65, dot_min: float = 40.0, dot_max: float = 320.0, dot_scale: float = 22.0, dot_const: float = 80.0, fdr_line: float | None = 0.05, annotate: bool = True, annotate_fmt: str | None = None, legend: bool = True, legend_label: str = 'gene set', label_width: int = 55, label_fontsize: float = 6.5, annotate_fontsize: float = 6.0, fdr_floor: float = 1e-50, title: str | None = None, subtitle: str | None = None, title_space: float = 0.18, xlabel: str | None = None, ax: Axes | None = None, figsize: Tuple[float, float] = (7.0, 5.0), return_fig: bool = False, save: str | None = None, **kwargs) → Figure | Axes | NoneView on GitHub ¶

Gene-set-enrichment lollipop plot.

Each row is an enriched term. A stem runs from the x-axis baseline to a dot whose x-position encodes significance (x_metric) and whose area encodes the matched-gene count.

Parameters:

data (StringDBReport, DataFrame, or records) –

Enrichment source. Three forms are accepted:

a kompot.plot.StringDBReport instance — its get_functional_enrichment() is called with category / fdr_threshold;
the signal-sorted DataFrame that method returns;
any other enrichment-result DataFrame (gseapy / enrichr, GOATOOLS, clusterProfiler, …). Use the *_col params to map its columns, or rely on autodetection.

Expected schema (logical field → autodetected column names):

Field	Candidate columns (case-insensitive)
term label	`description`, `term`, `name`, `pathway`, …
score	`signal`, `Combined Score`, `NES`, `score`, …
gene count	`number_of_genes`, `Count`, `Overlap` (`k/K`), …
FDR	`fdr`, `Adjusted P-value`, `p.adjust`, `padj`, …

n_terms (int, default 12) – Number of top terms to display (after sorting).
term_col (str, optional) – Explicit column names overriding autodetection for the term label, the score (used when x_metric="score"), the gene count (dot size), and the FDR (x-axis when x_metric="neg_log10_fdr", plus the guide line and annotations).
score_col (str, optional) – Explicit column names overriding autodetection for the term label, the score (used when x_metric="score"), the gene count (dot size), and the FDR (x-axis when x_metric="neg_log10_fdr", plus the guide line and annotations).
count_col (str, optional) – Explicit column names overriding autodetection for the term label, the score (used when x_metric="score"), the gene count (dot size), and the FDR (x-axis when x_metric="neg_log10_fdr", plus the guide line and annotations).
fdr_col (str, optional) – Explicit column names overriding autodetection for the term label, the score (used when x_metric="score"), the gene count (dot size), and the FDR (x-axis when x_metric="neg_log10_fdr", plus the guide line and annotations).
x_metric ({"neg_log10_fdr", "score"} or column name, default "neg_log10_fdr") – What the dot’s x-position encodes. "neg_log10_fdr" plots -log10(FDR) (manuscript default); "score" plots score_col directly; any other value is treated as a literal column name to plot.
sort_by (str or None, default "x") – How to order rows before taking the top n_terms. "x" sorts by the plotted value (most significant / highest score on top); any column name sorts by that column; None preserves input order (StringDB frames already arrive signal-sorted).
ascending (bool, optional) – Sort direction override. By default sorting is descending for "x" / score columns and ascending for FDR-like columns.
category (str, default "Process") – StringDB enrichment category, used only when data is a StringDBReport. See get_functional_enrichment().
fdr_threshold (float, default 0.05) – FDR cutoff passed through to StringDBReport (StringDBReport path only).
color (str, default "#d73027") – Lollipop fill color. The default is kompot’s “up” direction red (kompot.utils.KOMPOT_COLORS), matching the manuscript.
edge_color (str, optional) – Dot outline / stem color. Defaults to a darkened color.
cmap (str, optional) – If given, dots are colored by color_by through this colormap (a colorbar is added on standalone figures) instead of the solid color.
color_by (str, optional) – Column whose values drive the cmap coloring. Defaults to the resolved score column when cmap is set.
stem_lw (float) – Line width and alpha of the lollipop stems.
stem_alpha (float) – Line width and alpha of the lollipop stems.
dot_min (float, default 40, 320) – Clip bounds (area in pt²) for the gene-count dot sizer clip(dot_min + dot_scale * sqrt(count), dot_min, dot_max).
dot_max (float, default 40, 320) – Clip bounds (area in pt²) for the gene-count dot sizer clip(dot_min + dot_scale * sqrt(count), dot_min, dot_max).
dot_scale (float, default 22) – Multiplier in the dot sizer above.
dot_const (float, default 80) – Constant dot area used when no gene-count column is available.
fdr_line (float or None, default 0.05) – Draw a dashed vertical guide at this FDR (rendered at -log10(fdr_line) when x_metric="neg_log10_fdr"). None disables it. Ignored for non-FDR x metrics.
annotate (bool, default True) – Annotate each dot with n=<count> FDR=<fdr> to its right.
annotate_fmt (str, optional) – Custom format string for the annotation, receiving count and fdr as keyword fields, e.g. "{count} genes (q={fdr:.1e})".
legend (bool, default True) – Draw the aesthetic key (set swatch, dot-size cue, FDR guide).
legend_label (str, default "gene set") – Label for the set swatch in the legend.
label_width (int, default 55) – Soft-wrap width for term descriptions (two lines max). 0 disables wrapping.
label_fontsize (float) – Font sizes for the y-axis term labels and the per-dot annotation.
annotate_fontsize (float) – Font sizes for the y-axis term labels and the per-dot annotation.
fdr_floor (float, default 1e-50) – FDRs are clipped to this floor before -log10 to keep the x-axis finite.
title (str, optional) – Title (bold) and subtitle. On a standalone figure these sit in a reserved band above the axes (so the top row is never covered); when embedding into ax the title becomes the axes title.
subtitle (str, optional) – Title (bold) and subtitle. On a standalone figure these sit in a reserved band above the axes (so the top row is never covered); when embedding into ax the title becomes the axes title.
title_space (float, default 0.18) – Fraction of the standalone figure height reserved at the top for the title / subtitle / legend band.
xlabel (str, optional) – X-axis label. Defaults to $-\log_{10}(\mathrm{FDR})$ for the FDR metric, otherwise the score/column name.
ax (matplotlib.axes.Axes, optional) – Embed into this axis. If None a standalone figure is built.
figsize (tuple, default (7.0, 5.0)) – Standalone figure size; ignored when ax is given.
return_fig (bool, default False) – If True, return the Figure (standalone) or the Axes (embedded) instead of None.
save (str, optional) – If given, fig.savefig(save, bbox_inches="tight") is called.
**kwargs – Forwarded to the dot scatter() call.

Returns:

The figure (standalone) or axis (embedded) when return_fig is True, else None.

Return type:

matplotlib.figure.Figure or matplotlib.axes.Axes or None

Examples

From a StringDBReport (queries StringDB live):

import kompot
report = kompot.plot.StringDBReport(
    ["TP53", "BRCA1", "KRAS", "EGFR", "PTEN"], species_id=9606,
)
kompot.plot.lollipop(report, category="Process", n_terms=10,
                     return_fig=True)

From a precomputed enrichment table (offline, any tool):

import pandas as pd
df = pd.DataFrame({
    "description": ["immune response", "cell cycle", "apoptosis"],
    "fdr": [1e-8, 3e-5, 2e-3],
    "number_of_genes": [42, 18, 9],
    "signal": [3.1, 2.0, 1.2],
})
kompot.plot.lollipop(df, n_terms=3, return_fig=True)

A gseapy/enrichr frame, scored by Combined Score, mapped explicitly:

kompot.plot.lollipop(
    enrichr_df, x_metric="score",
    term_col="Term", score_col="Combined Score",
    count_col="Overlap", fdr_col="Adjusted P-value",
)

Embed into a composite figure:

fig, axes = plt.subplots(1, 2, figsize=(12, 5))
kompot.plot.lollipop(df_a, ax=axes[0], title="Condition A")
kompot.plot.lollipop(df_b, ax=axes[1], title="Condition B")

Heatmaps¶

kompot.plot.heatmap(adata: AnnData, var_names: List[str] | Sequence[str] | None = None, groupby: str = None, n_top_genes: int = 20, genes: List[str] | Sequence[str] | None = None, score_key: str | None = None, layer: str | None = None, standard_scale: str | int | None = 'var', cmap: str | Colormap | None = None, dendrogram: bool = False, cluster_rows: bool = True, cluster_cols: bool = True, dendrogram_color: str = 'black', figsize: Tuple[float, float] | None = None, tile_aspect_ratio: float = 1.0, tile_size: float = 0.3, show_gene_labels: bool = True, show_group_labels: bool = True, gene_labels_size: int = 12, group_labels_size: int = 12, colorbar_title: str | None = None, colorbar_kwargs: Dict[str, Any] | None = None, n_colorbar_ticks: int | None = 3, layout_config: Dict[str, float] | None = None, title: str | None = None, sort_genes: bool = True, vcenter: float | str | None = None, vmin: float | str | None = None, vmax: float | str | None = None, ax: Axes | None = None, draw_values: bool = False, return_fig: bool = False, return_data: bool = False, save: str | None = None, run_id: int = -1, condition_column: str | None = None, observed: bool = True, condition1: str | None = None, condition2: str | None = None, condition1_name: str | None = None, condition2_name: str | None = None, exclude_groups: str | List[str] | None = None, fold_change_mode: bool = False, split_dot_mode: bool = False, max_cell_count: int | None = None, **kwargs)View on GitHub ¶

Create a heatmap visualizing gene expression data for two conditions.

By default, the heatmap displays expression values with diagonally split cells, where the lower-left triangle shows values for the first condition and the upper-right triangle shows values for the second condition. This creates a compact visualization that highlights differences between conditions.

When fold_change_mode=True, each cell is a single square colored by the fold change (difference between means) between the two conditions, providing a simpler visualization focused on the differential expression.

When split_dot_mode=True, the heatmap displays dots split in half vertically, where the left half shows values for the first condition and the right half shows values for the second condition. The size of each half-dot is determined by the number of cells in that condition for that group, creating a visualization that highlights both expression differences and relative group sizes simultaneously.

Genes are shown on the y-axis and groups (cell types, clusters, etc.) are shown on the x-axis, with a legend and colorbar positioned to the right of the plot.

Parameters:

adata (AnnData) – AnnData object containing expression data
var_names (list, optional) – List of genes to include in the heatmap. If None, will use top genes based on score_key.
groupby (str, optional) – Key in adata.obs for grouping cells
n_top_genes (int, optional) – Number of top genes to include if var_names is None
genes (list, optional) – Alternative parameter name for specifying genes to include. Takes precedence over var_names if provided.
score_key (str, optional) – Key in adata.var for significance scores. If None, will try to infer from run information.
layer (str, optional) – Layer in AnnData to use for expression values. If None, uses .X
standard_scale (str or int, optional) – Whether to scale the expression values (‘var’, ‘group’ or 0, 1). Default is ‘var’ for gene-wise z-scoring. When any z-scoring is applied, the colormap is automatically centered at 0 (vcenter=0), uses symmetric limits (equal positive and negative ranges), and uses a divergent colormap unless vcenter, vmin, vmax, or cmap is explicitly specified.
cmap (str or colormap, optional) – Colormap to use for the heatmap. If None, defaults to “coolwarm” (divergent) when z-scoring is applied, “Reds” in split dot mode, and “viridis” (sequential) otherwise.
dendrogram (bool, optional) – Whether to show dendrograms for hierarchical clustering
cluster_rows (bool, optional) – Whether to cluster rows (genes)
cluster_cols (bool, optional) – Whether to cluster columns (groups)
dendrogram_color (str, optional) – Color for dendrograms
figsize (tuple, optional) – Figure size as (width, height) in inches. If None, will be calculated based on data dimensions, cell_size, and aspect_ratio.
tile_aspect_ratio (float, optional) – Aspect ratio of individual tiles (width/height). Default is 1.0 (square tiles). Values > 1 create wider tiles, values < 1 create taller tiles.
tile_size (float, optional) – Base size in inches for each tile when automatically calculating figure size. Default is 0.5 inches. For square tiles (tile_aspect_ratio=1), this is the width and height. For non-square tiles, this is the width if tile_aspect_ratio > 1, or the height if tile_aspect_ratio < 1.cell
show_gene_labels (bool, optional) – Whether to show gene labels
show_group_labels (bool, optional) – Whether to show group labels
gene_labels_size (int, optional) – Font size for gene labels
group_labels_size (int, optional) – Font size for group labels
colorbar_title (str, optional) – Title for the colorbar. If None, will default to “Z-score” when any z-scoring is applied (standard_scale=”var”, standard_scale=”group”, or standard_scale=0, 1), and “Expression” otherwise.
colorbar_kwargs (dict, optional) – Additional parameters for colorbar customization. Supported keys include: - ‘label_kwargs’: dict with parameters for colorbar label (e.g. fontsize, color) - ‘locator’: A matplotlib Locator instance for tick positions - ‘formatter’: A matplotlib Formatter instance for tick labels - Any attribute of matplotlib colorbar instance
n_colorbar_ticks (int, optional) – Number of ticks to display in the colorbar. Default is 3. This parameter provides a simple way to control tick density, while the colorbar_kwargs[‘locator’] option provides more fine-grained control if needed.
layout_config (dict, optional) – Configuration for controlling plot layout spacing. Keys include: - ‘gene_label_space’: Space for gene labels (y-axis), default 3.5 - ‘group_label_space’: Space for group labels (x-axis), default 2.0 - ‘title_space’: Space for title, default 3.0 - ‘base_legend_space’: Base space for legend, default 4.0 - ‘legend_name_factor’: Factor to adjust legend space based on condition name length, default 0.15 - ‘colorbar_space’: Space for colorbar, default 3.0 - ‘row_dendrogram_space’: Space for row dendrogram, default 2.5 - ‘col_dendrogram_space’: Space for column dendrogram, default 2.5 - ‘legend_fontsize’: Base font size for legend, default 12 - ‘legend_fontsize_factor’: Factor to reduce font size for long condition names, default 0.25 - ‘colorbar_height’: Height proportion of sidebar for colorbar, default 0.5 - ‘colorbar_width’: Width proportion for colorbar, default 0.25
title (str, optional) – Title for the heatmap
sort_genes (bool, optional) – Whether to sort genes by score
vcenter (float or str, optional) – Value to center the colormap at. If None and any z-scoring is applied (standard_scale=’var’, ‘group’, 0, or 1), the colormap will be centered at 0. If None and no z-scoring is applied, a standard (non-centered) colormap will be used. Can be specified as a percentile using ‘p<number>’ format (e.g., ‘p50’ for median).
vmin (float or str, optional) – Minimum value for colormap. If None and z-scoring is applied, will use a symmetric limit based on the maximum absolute value of the data. Can be specified as a percentile using ‘p<number>’ format (e.g., ‘p5’ for 5th percentile).
vmax (float or str, optional) – Maximum value for colormap. If None and z-scoring is applied, will use a symmetric limit based on the maximum absolute value of the data. Can be specified as a percentile using ‘p<number>’ format (e.g., ‘p95’ for 95th percentile).
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If None, creates new figure
draw_values (bool) – Whether to draw the values in the heatmap cells. Default is False.
return_fig (bool, optional) – If True, returns the figure and axes
return_data (bool, optional) – If True, returns the expression means and fold-changes used for the heatmap
save (str, optional) – Path to save figure. If None, figure is not saved
run_id (int, optional) – Specific run ID to use for fetching field names from run history. -1 (default) is the latest run.
condition_column (str, optional) – Column in adata.obs containing condition information. If None, tries to infer from run_info.
observed (bool, optional) – Whether to use only observed combinations in groupby operations.
condition1 (str, optional) – Names of the two conditions to compare. If None, tries to infer from run_info. These must match the values in the condition_column in adata.obs.
condition2 (str, optional) – Names of the two conditions to compare. If None, tries to infer from run_info. These must match the values in the condition_column in adata.obs.
condition1_name (str, optional) – Display names for the two conditions in the plot legend and title. If None, defaults to the values of condition1 and condition2.
condition2_name (str, optional) – Display names for the two conditions in the plot legend and title. If None, defaults to the values of condition1 and condition2.
exclude_groups (str or list, optional) – Group name(s) to exclude from the heatmap.
fold_change_mode (bool, optional) – Whether to use fold change coloring instead of split tiles
split_dot_mode (bool, optional) – Whether to use split dots instead of split tiles. When True, the size of each half-dot represents the number of cells in that condition for that group
max_cell_count (int, optional) – Upper limit for cell count used for dot sizing. If provided, all dots will be scaled relative to this maximum value, even if actual cell counts exceed it. This helps maintain readable visualization when some groups have much larger cell counts than others.
**kwargs – Additional keyword arguments passed to matplotlib

Returns:

If return_fig is True and dendrogram is False, returns (fig, ax)
If return_fig is True and dendrogram is True, returns (fig, ax, dendrogram_axes)

Direction Plots¶

kompot.plot.direction_barplot(adata: AnnData, category_column: str, direction_column: str | None = None, condition1: str | None = None, condition2: str | None = None, normalize: Literal['index', 'columns', None] = 'index', figsize: Tuple[float, float] = (12, 6), title: str | None = None, xlabel: str | None = None, ylabel: str | None = None, colors: Dict[str, str] | None = None, rotation: float = 90, legend_title: str = 'Direction', legend_loc: str = 'best', stacked: bool = True, sort_by: str | None = None, ascending: bool = False, category_order: List[str] | None = None, ax: Axes | None = None, return_fig: bool = False, save: str | None = None, run_id: int = -1, **kwargs) → Figure | NoneView on GitHub ¶

Create a barplot showing the direction of change distribution across categories.

This function creates a stacked or grouped barplot showing the distribution of up/down/neutral changes across different categories (like cell types).

Parameters:

adata (AnnData) – AnnData object containing differential abundance results
category_column (str) – Column in adata.obs to use for grouping (e.g., “cell_type”)
direction_column (str, optional) – Column in adata.obs containing direction information. If None, will try to infer from the run specified by run_id.
condition1 (str, optional) – Name of condition 1 (denominator in fold change). If None, will try to infer from the run_id.
condition2 (str, optional) – Name of condition 2 (numerator in fold change). If None, will try to infer from the run_id.
normalize (str or None, optional) – How to normalize the data. Options: “index” (normalize rows), “columns” (normalize columns), or None (raw counts).
figsize (tuple, optional) – Figure size as (width, height) in inches
title (str, optional) – Plot title. If None and conditions provided, uses “Direction of Change by {category_column}n{condition1} to {condition2}”
xlabel (str, optional) – Label for x-axis. If None, uses the category_column
ylabel (str, optional) – Label for y-axis. Defaults to “Percentage (%)” when normalize=”index”, otherwise “Count”
colors (dict, optional) – Dictionary mapping direction values to colors.
rotation (float, optional) – Rotation angle for x-tick labels
legend_title (str, optional) – Title for the legend
legend_loc (str, optional) – Location for the legend
stacked (bool, optional) – Whether to create a stacked (True) or grouped (False) bar plot
sort_by (str, optional) – Direction category to sort by (e.g., “up”, “down”). If None, uses the order in the data
ascending (bool, optional) – Whether to sort in ascending order.
category_order (list of str, optional) – Specific categories and their order to display. Defaults to data order.
ax (matplotlib.axes.Axes, optional) – Axes to plot on. If None, creates new figure
return_fig (bool, optional) – If True, returns the figure and axes
save (str, optional) – Path to save figure. If None, figure is not saved
run_id (int, optional) – Specific run ID to use for fetching data from run history. Negative indices count from the end (-1 is the latest run).

Returns:

If return_fig is True, returns (fig, ax)

Return type:

tuple or None

Smoothing Plots¶

kompot.plot.plot_smoothing(adata, genes: List[str] | None = None, n_top_genes: int = 6, basis: str = 'X_umap', result_key: str = 'kompot_smooth', condition: str | None = None, layer: str | None = None, show_obs_variance: bool = True, cmap: str = 'Spectral_r', cmap_std: str = 'magma', figsize_per_panel: tuple = (3.5, 3.0), title: str | None = None, save: str | None = None, return_fig: bool = False, **kwargs) → Figure | NoneView on GitHub ¶

Plot raw vs. GP-smoothed expression with uncertainty.

Shows a grid with rows: 1. Raw expression 2. GP-smoothed expression 3. Epistemic std (GP posterior, shared across genes) or total std if no obs_variance 4. Aleatoric std (sqrt of obs_variance, per-gene) – only if available

Uses scanpy.pl.embedding internally when available, falling back to manual scatter plots otherwise.

Parameters:

adata (AnnData) – AnnData object with smoothing results.
genes (list of str, optional) – Genes to plot. If None, selects top genes by mean smoothed value.
n_top_genes (int) – Number of genes when genes is None. Max 8.
basis (str) – Key in adata.obsm for 2-D coordinates.
result_key (str) – Base key used in smooth_expression().
condition (str, optional) – Condition label in the layer names. If None, auto-detected from available layers.
layer (str, optional) – Layer with raw expression. If None, uses adata.X.
show_obs_variance (bool) – Show obs_variance row if available.
cmap (str) – Colormap for expression values.
cmap_std (str) – Colormap for uncertainty panels.
figsize_per_panel (tuple) – Size of each subplot (width, height).
title (str, optional) – Overall figure title.
return_fig (bool) – If True, return the Figure instead of calling plt.show().
**kwargs – Extra keyword arguments forwarded to scanpy.pl.embedding (e.g. s, size, vmin, vmax).

Return type:

Figure or None

Embedding Plots¶

Plot embeddings with group filtering capabilities.

This function wraps scanpy’s plotting.embedding function but adds the ability to filter cells based on observation column values. Selected cells are plotted normally using scanpy, while non-selected cells can be displayed in a different color in the background.

Parameters:

adata (AnnData) – AnnData object containing the embedding coordinates.
basis (str) – Key for the embedding coordinates. Same as scanpy’s basis parameter.
groups (Dict[str, Union[str, List[str]]] or str or List[str], optional) – If a dictionary: keys are column names in adata.obs and values are lists or individual allowed values. Only cells matching ALL conditions will be highlighted. If a string: Same as scanpy’s groups parameter for categorical groupby. If None: all cells are shown normally.
background_color (str, optional) – Color for non-selected cells. If None, background cells are not shown. Default is “lightgrey”.
matplotlib_scatter_kwargs (Dict[str, Any], optional) – Additional keyword arguments to pass to matplotlib’s scatter function when plotting background cells. Common options include ‘alpha’, ‘s’ (size), ‘edgecolors’, and ‘zorder’. Defaults match scanpy’s styling with {‘zorder’: 0, ‘edgecolors’: ‘none’, ‘linewidths’: 0, ‘alpha’: 0.7}.
mgroups (List[Dict[str, Union[str, List[str]]]] or Dict[str, Dict[str, Union[str, List[str]]]], optional) – List or dictionary of groups dictionaries to create multiple panels. Each element is treated as a separate groups argument in its own subplot. Cannot be used with multiple colors. If provided as a list, title argument should align with the number of groups in mgroups. If provided as a dictionary, the keys will be used as title names unless titles is explicitly provided. If titles is provided but too short, a warning will be issued and the dictionary keys will be used for the remaining panels. Cannot be used when layer is a list.
ncols (int, optional) – Number of columns for panel layout when using mgroups or when layer, or color is a list. Default is 4 or less depending on the number of panels.
**kwargs –
All other parameters are passed directly to scanpy.pl.embedding. See scanpy.pl.embedding documentation for details on available parameters.

When layer is a list, each layer is plotted in a separate panel (only when color is not a list and mgroups is not used).

Returns:

Whatever scanpy.pl.embedding returns based on your kwargs.
If return_fig=True, returns the figure or (figure, axes) depending on scanpy version.
Otherwise returns None.

Notes

This function requires scanpy. If scanpy is not available, it will raise a warning. See scanpy.pl.embedding documentation for full details of base plotting parameters.

StringDB Integration¶

The kompot.plot.StringDBReport class provides tools to generate gene set reports with the StringDB network and resource links.

class kompot.plot.StringDBReport(genes: List[str], species_id: int = 9606, include_stringdb: bool = True, include_resources: bool = True, include_enrichment: bool = False, background: List[str] | None = None)View on GitHub

Generate rich gene set reports with StringDB integration.

This class provides tools to generate rich HTML reports for gene sets, including StringDB network visualization, resource links, and other gene information. It’s designed to work well in Jupyter notebooks but can also be used programmatically.

Parameters:

genes (List[str]) – List of gene symbols to include in the report
species_id (int, optional) – NCBI taxonomy ID for species (default: 9606 for Homo sapiens)
include_stringdb (bool, optional) – Include StringDB network image and links (default: True)
include_resources (bool, optional) – Include external resource links for genes (default: True)
include_enrichment (bool, optional) – Include functional enrichment analysis (default: False)
background (List[str], optional) – Gene symbols defining the statistical background (the tested universe) for over-representation analysis, e.g. adata.var_names of the analyzed object. When None (default), StringDB uses its genome-wide background, which inflates significance for any experiment that only measured a subset of genes. Passing the genes that were actually tested is the statistically correct choice. See get_functional_enrichment() for details on how the background is applied.

genes

List of gene symbols included in the report

Type:: List[str]

species_id

NCBI taxonomy ID for the species

Type:: int

string_db_base_url

Base URL for StringDB API and web interface

Type:: str

Notes

Supported species IDs and their names:

Species ID	Species Name
9606	Homo sapiens
10090	Mus musculus
10116	Rattus norvegicus
7227	Drosophila melanogaster
6239	Caenorhabditis elegans
4932	Saccharomyces cerevisiae
3702	Arabidopsis thaliana

Additional species IDs can be used but won’t have mapped names in the report. For the full list of available species, see the StringDB website.

display(additional_genes: List[str] | None = None) → NoneView on GitHub

Display the report in a Jupyter notebook.

Parameters:: additional_genes (List[str], optional) – Additional genes to include in the StringDB visualizations

fetch_stringdb_image(additional_genes: List[str] | None = None) → bytes | NoneView on GitHub

Fetch StringDB network image as bytes.

Parameters:: additional_genes (List[str], optional) – Additional genes to include in the StringDB image
Returns:: Image bytes or None if fetch failed
Return type:: Optional[bytes]

get_functional_enrichment(category: str = 'Process', fdr_threshold: float = 0.05, background: List[str] | None = None) → DataFrame | NoneView on GitHub

Get functional enrichment analysis for the gene set.

This method fetches functional enrichment results through StringDB’s enrichment API.

Parameters:

category (str, optional) – Category for enrichment analysis (default: “Process”) Valid options: - Process: Gene Ontology biological processes - Component: Gene Ontology cellular components - Function: Gene Ontology molecular functions - KEGG: KEGG pathways - Pfam: Protein domain annotations from Pfam - InterPro: Protein domain annotations from InterPro - SMART: Protein domain annotations from SMART - Keywords: UniProt keyword annotations - Reactome: Reactome pathway annotations - WikiPathways: WikiPathways annotations
fdr_threshold (float, optional) – FDR threshold for significance (default: 0.05)
background (List[str], optional) –
Gene symbols defining the statistical background (the tested universe) for the over-representation analysis. Overrides the instance-level background passed at construction for this call. When both are None (the default), StringDB uses its genome-wide background.

Why this matters. Over-representation analysis compares the foreground against a universe. If an experiment only measured a subset of genes (as in most single-cell / targeted assays), the correct universe is the set of genes actually tested, not the whole genome. Using the genome-wide default inflates significance by deflating the background — the classic ORA pitfall. Pass the tested gene set (e.g. adata.var_names) here to correct it.

Both the foreground and the supplied background are mapped to STRING identifiers via _map_to_string_ids() so they share one identifier space (StringDB requires STRING IDs, not symbols, for the background), and the universe size used for the strength/signal columns is set to the mapped background size rather than the species-wide protein count. If the mapping yields an empty foreground or background, the call falls back to the genome-wide background and logs a warning.

Returns:

DataFrame with enrichment results or None if request failed

Return type:

Optional[pd.DataFrame]

Notes

The enrichment results include various columns depending on the category: - term: Identifier for the enriched term (e.g., GO:0006281) - description: Human-readable description of the term - signal: Balanced metric combining enrichment magnitude and significance (higher is better) - strength: Log10(observed/expected) indicating enrichment effect size - fdr: False discovery rate (adjusted p-value) - number_of_genes: Number of genes from the input that match this term - inputGenes: List of input genes that match this term

Results are sorted by signal (descending) following StringDB’s default behavior. Different categories have different levels of annotation coverage. For example, GO Process usually provides the most annotations, while specific pathway databases may have more limited coverage.

get_json(additional_genes: List[str] | None = None) → Dict[str, Any]View on GitHub

Generate a JSON representation of the gene report.

Parameters:: additional_genes (List[str], optional) – Additional genes to include in the StringDB visualizations
Returns:: JSON-serializable dictionary with report data
Return type:: Dict[str, Any]

get_resource_links(gene: str) → Dict[str, str]View on GitHub

Generate external resource links for a gene.

Parameters:: gene (str) – Gene symbol to generate links for
Returns:: Dictionary mapping resource names to URLs
Return type:: Dict[str, str]

get_species_name() → strView on GitHub: Get human-readable species name from species ID.

get_stringdb_image_url(additional_genes: List[str] | None = None) → strView on GitHub

Generate URL for StringDB network image.

Parameters:: additional_genes (List[str], optional) – Additional genes to include in the StringDB image
Returns:: URL for StringDB network image
Return type:: str

get_stringdb_url(additional_genes: List[str] | None = None) → strView on GitHub

Generate URL for StringDB network visualization.

Parameters:: additional_genes (List[str], optional) – Additional genes to include in the StringDB query
Returns:: URL for StringDB network visualization
Return type:: str

save_html(filename: str, additional_genes: List[str] | None = None) → NoneView on GitHub

Save the report as an HTML file.

Parameters:

filename (str) – Path to save the HTML file
additional_genes (List[str], optional) – Additional genes to include in the StringDB visualizations

to_dataframe() → DataFrameView on GitHub

Convert gene resource links to a pandas DataFrame.

Returns:: DataFrame with genes as index and resource links as columns
Return type:: pd.DataFrame