Differential Expression: Advanced Analysis

This tutorial explores advanced differential expression (DE) analysis with Kompot, building on the Getting Started tutorial.

You’ll learn how to:

  • Customize DE analysis parameters for your specific dataset

  • Work with null gene distributions for FDR estimation

  • Perform multiple comparisons and track results

  • Use advanced visualization options

  • Optimize computational resources

Setup

[1]:
import anndata as ad
import matplotlib.pyplot as plt
import numpy as np
import palantir
import pandas as pd
import scanpy as sc

import kompot

plt.rcParams["axes.spines.right"] = False
plt.rcParams["axes.spines.top"] = False
plt.rcParams["image.cmap"] = "Spectral_r"
[2]:
DATA_PATH = "../data/murine_bone_marrow_aging.h5ad"
GROUPING_COLUMN = "Age"
CONDITIONS = ["Young", "Old"]
CELL_TYPE_COLUMN = "highres_celltype"
DIMENSIONALITY_REDUCTION = "X_pca_harmony"
LAYER_FOR_EXPRESSION = "logged_counts"

Load and Prepare Data

We’ll reuse the dataset from the first tutorial:

[3]:
adata = ad.read_h5ad(DATA_PATH)
palantir.utils.run_diffusion_maps(adata, pca_key="X_pca_harmony", n_components=40)
adata
/fh/fast/setty_m/user/dotto/kompot/.venv/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py:131: UserWarning: Could not find the number of physical cores for the following reason:
found 0 physical cores < 1
Returning the number of logical cores instead. You can silence this warning by setting LOKY_MAX_CPU_COUNT to the number of cores you want to use.
  warnings.warn(
  File "/fh/fast/setty_m/user/dotto/kompot/.venv/lib/python3.12/site-packages/joblib/externals/loky/backend/context.py", line 255, in _count_physical_cores
    raise ValueError(f"found {cpu_count_physical} physical cores < 1")
[3]:
AnnData object with n_obs × n_vars = 8090 × 16285
    obs: 'Compartment', 'Replicate', 'Age', 'Sample', 'Info', 'batch', 'doublet_score', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_hb', 'pct_counts_hb', 'S_score', 'G2M_score', 'phase', 'leiden', 'phenograph', 'highres_celltype', 'midres_celltype'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'hb', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'Age_colors', 'Compartment_colors', 'DMEigenValues', 'Info_colors', 'README', 'Replicate_colors', 'Sample_colors', 'batch_colors', 'draw_graph', 'highres_celltype_colors', 'hvg', 'leiden', 'leiden_colors', 'midres_celltype_colors', 'neighbors', 'pca', 'phase_colors', 'umap', 'DM_EigenValues'
    obsm: 'AbCapture', 'DM_EigenVectors', 'HTO', 'X_draw_graph_fa', 'X_pca', 'X_pca_harmony', 'X_pca_noregression', 'X_umap'
    varm: 'PCs'
    layers: 'MAGIC_imputed_data', 'logged_counts', 'normalized_counts', 'raw_counts'
    obsp: 'DM_Kernel', 'connectivities', 'distances', 'DM_Similarity'

Understanding DE Parameters

The kompot.de function accepts settings dataclasses for grouped parameter control:

GP Settings (kompot.GPSettings)

  • ``sigma``: Noise level in the expression layer (default: 1)

    • Adjust based on your normalization method

    • Higher values for noisier data

    • Lower values for denoised data

  • ``n_landmarks``: Number of landmark points for Mahalanobis computation (default: 5000)

    • More landmarks = more accurate but slower

    • Kompot automatically uses min(n_landmarks, n_cells)

FDR Settings (kompot.FDRSettings)

  • ``null_genes``: Number of permuted genes for FDR estimation (default: “auto” → 2000)

    • Higher values give better FDR estimates but increase computation time

    • Set to 0 to disable FDR computation

  • ``threshold``: FDR threshold for the is_de column (default: 0.05)

Storage Settings (kompot.StorageSettings)

  • ``result_key``: Prefix for result field names (default: “kompot_de”)

    • Change to avoid overwriting previous results

Let’s run DE analysis with customized parameters:

[4]:
de_results = kompot.de(
    adata,
    groupby=GROUPING_COLUMN,
    condition1=CONDITIONS[0],
    condition2=CONDITIONS[1],
    layer=LAYER_FOR_EXPRESSION,
    obsm_key=DIMENSIONALITY_REDUCTION,
    gp=kompot.GPSettings(batch_size=0),  # 0 = all cells at once; increase for lower memory
    fdr=kompot.FDRSettings(null_genes=4000),
)
[2026-03-26 05:37:46,558] [INFO    ] Condition 1 (Young): 2,917 cells
[2026-03-26 05:37:46,558] [INFO    ] Condition 2 (Old): 3,116 cells
[2026-03-26 05:37:55,958] [INFO    ] Fitting expression estimator for condition 1...
[2026-03-26 05:38:15,072] [INFO    ] Fitting expression estimator for condition 2...
[2026-03-26 05:38:59,824] [INFO    ] FDR analysis complete: 879/16285 genes significantly DE at FDR < 0.05
[2026-03-26 05:38:59,826] [INFO    ] Mahalanobis distance threshold for FDR < 0.05: 10.4591
[2026-03-26 05:39:04,550] [INFO    ] This run will have `run_id=0`.

Volcano Plot Customization

The volcano_de function offers extensive customization options.

Changing the Y-Axis Metric

By default, the y-axis shows Mahalanobis distance. You can switch to local FDR:

[5]:
kompot.plot.volcano_de(
    adata,
    y_axis_type="local_fdr",
    significance_threshold=0.02,  # Adjust FDR threshold
)
[2026-03-26 05:39:04,989] [INFO    ] Found DE run info for run_id=-1
[2026-03-26 05:39:04,991] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2026-03-26 05:39:04,992] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2026-03-26 05:39:04,993] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2026-03-26 05:39:04,994] [INFO    ] Using local_fdr values for y-axis: kompot_de_Young_to_Old_mahalanobis_local_fdr
[2026-03-26 05:39:04,995] [INFO    ] Using DE run 0: comparing Young to Old
[2026-03-26 05:39:05,024] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis_local_fdr'
[2026-03-26 05:39:05,025] [INFO    ] Applied local_fdr transformation to y-axis data
[2026-03-26 05:39:05,026] [INFO    ] Significance threshold selection: using column 'kompot_de_Young_to_Old_mahalanobis_local_fdr' with threshold < 0.02
[2026-03-26 05:39:05,027] [INFO    ] Values range: 0.002420 - 1.000000
[2026-03-26 05:39:05,028] [INFO    ] Found 780 genes with local_fdr < 0.02
[2026-03-26 05:39:05,032] [INFO    ] Highlighting 780 genes at local_fdr < 0.02 (442 up, 338 down)
[2026-03-26 05:39:05,047] [INFO    ] Labeling top 10 genes by score
[2026-03-26 05:39:05,052] [INFO    ] Added local_fdr threshold line at y=1.70 (local_fdr=0.02)
../_images/notebooks_02_differential_expression_detailed_8_1.png

Highlighting Gene Sets

Highlight specific gene sets with custom colors and labels:

[6]:
# Define gene sets of interest
gene_sets = [
    {
        "name": "MHC class II",
        "genes": ["H2-Ab1", "H2-Aa", "Cd74", "H2-Eb1"],
        "color": "#E76F51",
    },
    {
        "name": "Antioxidant",
        "genes": ["S100a8", "Alox5ap", "Hp", "S100a9", "Mgst1", "Apoe"],
        "color": "#2A9D8F",
    },
]

kompot.plot.volcano_de(
    adata,
    significance_threshold={"local_fdr": 0.02, "mahalanobis": 30},
    gene_labels=["S100a8", "Alox5ap", "Hp", "S100a9", "Mgst1", "Apoe"],
    highlight_genes=gene_sets,
)
[2026-03-26 05:39:05,330] [INFO    ] Found DE run info for run_id=-1
[2026-03-26 05:39:05,331] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2026-03-26 05:39:05,332] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2026-03-26 05:39:05,337] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2026-03-26 05:39:05,338] [INFO    ] Using DE run 0: comparing Young to Old
[2026-03-26 05:39:05,347] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis'
[2026-03-26 05:39:05,349] [INFO    ] Added highlight group 'MHC class II' with 4 genes
[2026-03-26 05:39:05,350] [INFO    ] Added highlight group 'Antioxidant' with 6 genes
[2026-03-26 05:39:05,360] [INFO    ] Labeling 6 specific genes
[2026-03-26 05:39:05,365] [INFO    ] Skipping threshold line drawing for dictionary-format significance_threshold
../_images/notebooks_02_differential_expression_detailed_10_1.png

gene_labels can be:

  • A list of gene names

  • A dictionary mapping gene names to custom labels

  • An integer to auto-select the top N genes

highlight_genes can be:

  • A list of gene names (all same color)

  • A list of dictionaries with “name”, “genes”, and “color” keys (as shown)

  • An integer to highlight the top N genes

Plotting Individual Genes

Imputed expression for each condition is stored in adata.layers:

  • kompot_de_Young_imputed

  • kompot_de_Old_imputed

  • kompot_de_Young_to_Old_fold_change

You can plot these manually:

gene = "Igkc"

# Manual plotting
sc.pl.embedding(adata, basis="umap", color=gene, layer="logged_counts")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Young_imputed")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Old_imputed")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Young_to_Old_fold_change")

Or use the convenience function:

[7]:
kompot.plot.plot_gene_expression(
    adata, gene="Igkc", vmin="p2", vmax="p98", frameon=False
)
[2026-03-26 05:39:05,582] [INFO    ] Found DE run info for run_id=-1
[2026-03-26 05:39:05,583] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2026-03-26 05:39:05,584] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2026-03-26 05:39:05,585] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2026-03-26 05:39:05,586] [INFO    ] Using DE run 0: comparing Young to Old
[2026-03-26 05:39:05,587] [INFO    ] Using fields for gene expression plot - lfc_key: 'kompot_de_Young_to_Old_mean_lfc', score_key: 'kompot_de_Young_to_Old_mahalanobis'
[2026-03-26 05:39:05,587] [INFO    ] Using layer 'logged_counts' inferred from run information
[2026-03-26 05:39:05,590] [INFO    ] Using imputed layer 'kompot_de_Young_imputed' for 'Young'
[2026-03-26 05:39:05,592] [INFO    ] Using imputed layer 'kompot_de_Old_imputed' for 'Old'
[2026-03-26 05:39:05,593] [INFO    ] Using fold_change layer 'kompot_de_Young_to_Old_fold_change' from run_info
../_images/notebooks_02_differential_expression_detailed_13_1.png

Comparing Conditions with Subplots

The kompot.plot.embedding wrapper supports filtering to specific groups:

[8]:
kompot.plot.embedding(
    adata,
    "umap",
    color="Igkc",
    layer=LAYER_FOR_EXPRESSION,
    frameon=False,
    mgroups=[{GROUPING_COLUMN: condition} for condition in CONDITIONS],
)
[2026-03-26 05:39:06,406] [INFO    ] Selected 2,917 cells out of 8,090 total cells.
[2026-03-26 05:39:06,512] [INFO    ] Selected 3,116 cells out of 8,090 total cells.
../_images/notebooks_02_differential_expression_detailed_15_1.png

The mgroups parameter creates multiple subplots, each showing cells filtered by the specified conditions.

Heatmap Customization

The heatmap function visualizes average expression per group.

Z-Score Normalization

By default, values are z-scored across conditions for each gene. Disable this to show raw expression:

[9]:
kompot.plot.heatmap(
    adata,
    n_top_genes=20,
    groupby=CELL_TYPE_COLUMN,
    exclude_groups="Plasma cell",
    standard_scale=None,  # Disable z-scoring
    vmax="p99",
)
[2026-03-26 05:39:06,875] [INFO    ] Found DE run info for run_id=-1
[2026-03-26 05:39:06,876] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2026-03-26 05:39:06,879] [INFO    ] Successfully inferred fields: {'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2026-03-26 05:39:06,880] [INFO    ] Using DE run 0 for heatmap.
[2026-03-26 05:39:06,881] [INFO    ] Inferred score_key='kompot_de_Young_to_Old_mahalanobis' from run information
[2026-03-26 05:39:06,887] [INFO    ] Inferred condition_column='Age' from run information
[2026-03-26 05:39:06,888] [INFO    ] Inferred condition1='Young' from run information
[2026-03-26 05:39:06,888] [INFO    ] Inferred condition2='Old' from run information
[2026-03-26 05:39:06,889] [INFO    ] Inferred layer='logged_counts' from run information
[2026-03-26 05:39:06,889] [INFO    ] Creating split heatmap with 20 genes/features
[2026-03-26 05:39:06,890] [INFO    ] Using expression data from layer: 'logged_counts'
[2026-03-26 05:39:06,967] [INFO    ] Excluded 7 cells from groups: Plasma cell
../_images/notebooks_02_differential_expression_detailed_18_1.png

Custom Gene Lists

Instead of selecting top genes by Mahalanobis distance, provide a custom list:

[10]:
custom_genes = ["H2-Q7", "Cd74", "H2-Aa", "H2-Ab1", "S100a9", "S100a8", "Apoe"]

kompot.plot.heatmap(
    adata,
    genes=custom_genes,
    groupby=CELL_TYPE_COLUMN,
    exclude_groups="Plasma cell",
    vmin="p1",
    vmax="p99",
)
[2026-03-26 05:39:08,550] [INFO    ] Inferred condition_column='Age' from run information
[2026-03-26 05:39:08,551] [INFO    ] Inferred condition1='Young' from run information
[2026-03-26 05:39:08,552] [INFO    ] Inferred condition2='Old' from run information
[2026-03-26 05:39:08,553] [INFO    ] Inferred layer='logged_counts' from run information
[2026-03-26 05:39:08,554] [INFO    ] Creating split heatmap with 7 genes/features
[2026-03-26 05:39:08,555] [INFO    ] Using expression data from layer: 'logged_counts'
[2026-03-26 05:39:08,620] [INFO    ] Excluded 7 cells from groups: Plasma cell
[2026-03-26 05:39:08,626] [INFO    ] Applying gene-wise z-scoring (standard_scale='var')
../_images/notebooks_02_differential_expression_detailed_20_1.png

Simplify Heatmap Display

To simplify the plot, pass fold_change_mode=True. Each square will then represent only the difference between the two groups, which equals the log fold change if the expression data was log-transformed.

[11]:
kompot.plot.heatmap(
    adata,
    genes=custom_genes,
    groupby=CELL_TYPE_COLUMN,
    exclude_groups="Plasma cell",
    vmin="p1",
    vmax="p99",
    fold_change_mode=True,
)
[2026-03-26 05:39:09,262] [INFO    ] Inferred condition_column='Age' from run information
[2026-03-26 05:39:09,264] [INFO    ] Inferred condition1='Young' from run information
[2026-03-26 05:39:09,265] [INFO    ] Inferred condition2='Old' from run information
[2026-03-26 05:39:09,266] [INFO    ] Inferred layer='logged_counts' from run information
[2026-03-26 05:39:09,266] [INFO    ] Creating fold change heatmap with 7 genes/features
[2026-03-26 05:39:09,268] [INFO    ] Using expression data from layer: 'logged_counts'
[2026-03-26 05:39:09,332] [INFO    ] Excluded 7 cells from groups: Plasma cell
[2026-03-26 05:39:09,339] [INFO    ] Applying gene-wise z-scoring (standard_scale='var')
[2026-03-26 05:39:09,383] [WARNING ] standard_scale is ignored in fold_change_mode as z-scoring is not appropriate for fold changes
../_images/notebooks_02_differential_expression_detailed_22_1.png

Multiple Comparisons

Kompot tracks multiple analysis runs, allowing you to perform different comparisons on the same dataset.

Running Additional Comparisons

Let’s add a comparison between Young and Mid-age mice (if Mid exists in your data):

[12]:
de_results_2 = kompot.de(
    adata,
    groupby=GROUPING_COLUMN,
    condition1="Young",
    condition2="Mid",
    layer=LAYER_FOR_EXPRESSION,
    obsm_key=DIMENSIONALITY_REDUCTION,
    gp=kompot.GPSettings(batch_size=0),
)
[2026-03-26 05:39:09,955] [WARNING ] Differential expression results with result_key='kompot_de' already exist in the dataset. Previous run was at 2026-03-26T05:39:04.524253 comparing Young to Old. Fields that will be overwritten: layers.kompot_de_Young_imputed, obs.kompot_de_Young_std Set overwrite=False to prevent overwriting or overwrite=True to silence this message.
[2026-03-26 05:39:09,958] [INFO    ] Condition 1 (Young): 2,917 cells
[2026-03-26 05:39:09,959] [INFO    ] Condition 2 (Mid): 2,057 cells
[2026-03-26 05:39:11,315] [INFO    ] Fitting expression estimator for condition 1...
[2026-03-26 05:39:11,532] [WARNING ] Gaussin process type is GaussianProcessType.FIXED and n_landmarks=5,000 are requested while only 2,917 datapoints are available. Using all datapoints for 2,917 landmarks instead.
[2026-03-26 05:39:13,733] [INFO    ] Fitting expression estimator for condition 2...
[2026-03-26 05:39:16,683] [WARNING ] Gaussin process type is GaussianProcessType.FIXED and n_landmarks=5,000 are requested while only 2,057 datapoints are available. Using all datapoints for 2,057 landmarks instead.
[2026-03-26 05:39:34,348] [INFO    ] FDR analysis complete: 626/16285 genes significantly DE at FDR < 0.05
[2026-03-26 05:39:34,350] [INFO    ] Mahalanobis distance threshold for FDR < 0.05: 10.4155
[2026-03-26 05:39:38,458] [INFO    ] This run will have `run_id=1`.

Kompot warns about field overwrites and tracks which run created each field.

Accessing Different Runs

Each analysis is assigned a run_id (0 for the first run, 1 for the second, etc.). Most plotting functions accept a run_id parameter:

[13]:
# Plot the first comparison (Young vs Old)
kompot.plot.volcano_de(adata, run_id=0, title="Young vs Old", gene_labels=50)
[2026-03-26 05:39:38,694] [INFO    ] Found DE run info for run_id=0
[2026-03-26 05:39:38,695] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2026-03-26 05:39:38,696] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2026-03-26 05:39:38,697] [WARNING ] Field 'kompot_de_Young_to_Old_mean_lfc' was last written by run 0, but current context expects run 1. The field may have been overwritten.
[2026-03-26 05:39:38,698] [WARNING ] Field 'kompot_de_Young_to_Old_mahalanobis' was last written by run 0, but current context expects run 1. The field may have been overwritten.
[2026-03-26 05:39:38,699] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2026-03-26 05:39:38,700] [WARNING ] Field inference completed with 2 warnings
[2026-03-26 05:39:38,700] [INFO    ] Using DE run 0: comparing Young to Old
[2026-03-26 05:39:38,710] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis'
[2026-03-26 05:39:38,716] [INFO    ] Highlighting 879 genes marked as DE (490 up, 389 down)
[2026-03-26 05:39:38,730] [INFO    ] Labeling top 50 genes by score
../_images/notebooks_02_differential_expression_detailed_26_1.png
[14]:
kompot.plot.plot_gene_expression(
    adata, run_id=0, gene="Aspa", frameon=False
)
[2026-03-26 05:39:39,103] [INFO    ] Found DE run info for run_id=0
[2026-03-26 05:39:39,104] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2026-03-26 05:39:39,105] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2026-03-26 05:39:39,106] [WARNING ] Field 'kompot_de_Young_to_Old_mean_lfc' was last written by run 0, but current context expects run 1. The field may have been overwritten.
[2026-03-26 05:39:39,108] [WARNING ] Field 'kompot_de_Young_to_Old_mahalanobis' was last written by run 0, but current context expects run 1. The field may have been overwritten.
[2026-03-26 05:39:39,108] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2026-03-26 05:39:39,109] [WARNING ] Field inference completed with 2 warnings
[2026-03-26 05:39:39,110] [INFO    ] Using DE run 0: comparing Young to Old
[2026-03-26 05:39:39,110] [INFO    ] Using fields for gene expression plot - lfc_key: 'kompot_de_Young_to_Old_mean_lfc', score_key: 'kompot_de_Young_to_Old_mahalanobis'
[2026-03-26 05:39:39,111] [INFO    ] Using layer 'logged_counts' inferred from run information
[2026-03-26 05:39:39,112] [INFO    ] Using imputed layer 'kompot_de_Young_imputed' for 'Young'
[2026-03-26 05:39:39,113] [INFO    ] Using imputed layer 'kompot_de_Old_imputed' for 'Old'
[2026-03-26 05:39:39,114] [INFO    ] Using fold_change layer 'kompot_de_Young_to_Old_fold_change' from run_info
../_images/notebooks_02_differential_expression_detailed_27_1.png
[15]:
# Plot the second comparison (Young vs Mid)
kompot.plot.volcano_de(adata, run_id=1, title="Young vs Mid")
[2026-03-26 05:39:39,915] [INFO    ] Found DE run info for run_id=1
[2026-03-26 05:39:39,917] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Mid_mean_lfc' from run info
[2026-03-26 05:39:39,918] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Mid_mahalanobis' from run info
[2026-03-26 05:39:39,919] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Mid_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Mid_mahalanobis'}
[2026-03-26 05:39:39,920] [INFO    ] Using DE run 1: comparing Young to Mid
[2026-03-26 05:39:39,930] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Mid_mean_lfc', score: 'kompot_de_Young_to_Mid_mahalanobis'
[2026-03-26 05:39:39,935] [INFO    ] Highlighting 626 genes marked as DE (253 up, 373 down)
[2026-03-26 05:39:39,948] [INFO    ] Labeling top 10 genes by score
../_images/notebooks_02_differential_expression_detailed_28_1.png
[16]:
kompot.plot.plot_gene_expression(
    adata, gene="Aspa", frameon=False
)
[2026-03-26 05:39:40,191] [INFO    ] Found DE run info for run_id=-1
[2026-03-26 05:39:40,193] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Mid_mean_lfc' from run info
[2026-03-26 05:39:40,194] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Mid_mahalanobis' from run info
[2026-03-26 05:39:40,196] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Mid_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Mid_mahalanobis'}
[2026-03-26 05:39:40,197] [INFO    ] Using DE run 1: comparing Young to Mid
[2026-03-26 05:39:40,198] [INFO    ] Using fields for gene expression plot - lfc_key: 'kompot_de_Young_to_Mid_mean_lfc', score_key: 'kompot_de_Young_to_Mid_mahalanobis'
[2026-03-26 05:39:40,198] [INFO    ] Using layer 'logged_counts' inferred from run information
[2026-03-26 05:39:40,200] [INFO    ] Using imputed layer 'kompot_de_Young_imputed' for 'Young'
[2026-03-26 05:39:40,201] [INFO    ] Using imputed layer 'kompot_de_Mid_imputed' for 'Mid'
[2026-03-26 05:39:40,201] [INFO    ] Using fold_change layer 'kompot_de_Young_to_Mid_fold_change' from run_info
../_images/notebooks_02_differential_expression_detailed_29_1.png

You can also use negative indexing (run_id=-1 for the most recent run).

Inspecting Run Metadata

The RunInfo utility provides detailed information about each analysis. Here we inspect the first run to see that some data has been overwritten by the most recent one:

[17]:
run_info = kompot.RunInfo(adata, run_id=0, analysis_type="de")
run_info
[17]:

Run 0 (DE Analysis)

Run Summary
Analysis: DE  |  Run ID: 0  |  Timestamp: 2026-03-27T00:41:42.174094  |  Conditions: Young to OldFields Overwritten: 2
ParameterValue
conditionsYoung to Old
obsm_keyX_pca_harmony
uses_sample_varianceFalse
layerlogged_counts
timestamp2026-03-27T00:41:42.174094
Fields Created9
All Parameters
ParameterValue
groupbyAge
condition1Young
condition2Old
obsm_keyX_pca_harmony
layerlogged_counts
gp
gp.sigma1.0
gp.ls_factor10.0
gp.n_landmarks5000
gp.use_empirical_varianceFalse
gp.batch_size0
gp.eps1e-08
gp.jit_compileFalse
fdr
fdr.null_genes4000
fdr.null_seed42
fdr.threshold0.05
filter
filter.min_cells2
storage
storage.result_keykompot_de
storage.store_landmarksFalse
storage.store_posterior_covarianceFalse
storage.store_additional_statsFalse
storage.max_memory_ratio0.8
output
output.copyFalse
output.inplaceTrue
output.return_full_resultsFalse
output.compute_mahalanobisTrue
output.allow_single_condition_varianceFalse
output.progressTrue
result_keykompot_de
Environment
ParameterValue
hostnamegizmok39
package_versions  anndata: 0.12.10
  jax: 0.9.0.1
  jaxlib: 0.9.0.1
  kompot: 0.7.0
  numpy: 2.4.2
  pandas: 2.3.3
  scipy: 1.17.1
pid18304
platformLinux-4.15.0-213-generic-x86_64-with-glibc2.35
python_version3.12.12
timestamp2026-03-27T00:41:42.716684
usernamedotto
Fields Created by This Run
Total Fields: 9  |  Present: 7  |  Missing: 0  |  Overwritten: 2
Field NameLocationDescriptionStatus
LAYERS Fields
kompot_de_Old_imputedlayers[imputed] Imputed expression for OldPresent
kompot_de_Young_imputedlayers[imputed] Imputed expression for YoungOverwritten by Run 1
kompot_de_Young_to_Old_fold_changelayers[fold_change] Log fold change for each cell and genePresent
OBS Fields
kompot_de_Old_stdobs[std] Posterior standard deviation of imputed expression for Old (same for all genes)Present
kompot_de_Young_stdobs[std] Posterior standard deviation of imputed expression for Young (same for all genes)Overwritten by Run 1
VAR Fields
kompot_de_Young_to_Old_is_devar[is_de] Boolean indicator of differential expression at local FDR < 0.05Present
kompot_de_Young_to_Old_mahalanobisvar[mahalanobis] Mahalanobis distancesPresent
kompot_de_Young_to_Old_mahalanobis_local_fdrvar[mahalanobis_local_fdr] Local FDR values using empirical null estimation similar to R's fdrtoolPresent
kompot_de_Young_to_Old_mean_lfcvar[mean_log_fold_change] Mean log fold change valuesPresent

Comparing Runs

Compare parameters between two runs:

[18]:
kompot.RunInfo(adata, run_id=0, analysis_type="de").compare_with(1)
[18]:

Comparison of Run 0 and Run 1

Summary
Analysis Type: DE  |  Run 0: unknown (2026-03-27T00:41:42.174094)  |  Run 1: unknown (2026-03-27T00:42:58.693517)
Parameters: 2 different, 38 sameFields: 14 different, 2 same
AspectRun Details
Run 0Run 1
conditionsYoung to OldYoung to Mid
result_keykompot_dekompot_de
uses_sample_varianceFalseFalse
timestamp2026-03-27T00:41:42.1740942026-03-27T00:42:58.693517
Field Count99
Parameter Differences
Total Parameters: 40  |  Different: 2  |  Only in Run 0: 0  |  Only in Run 1: 0
Different Conditions
Key Parameter Differences
ParameterRun 0Run 1
condition2OldMid
All Parameter Differences
ParameterRun 0Run 1
Different Parameters
fdr.null_genes40002000
38 parameters are the same in both runs
Field Differences
Fields in both runs: 2  |  Only in Run 0: 7  |  Only in Run 1: 7
Field NameLocationStatusLast Modified By
LAYERS Shared Fields
kompot_de_Young_imputedlayersCurrent value from Run 1Run 1
OBS Shared Fields
kompot_de_Young_stdobsCurrent value from Run 1Run 1
LAYERS Different Fields
kompot_de_Old_imputedlayersOnly in Run 0Run 0
kompot_de_Young_to_Old_fold_changelayersOnly in Run 0Run 0
kompot_de_Mid_imputedlayersOnly in Run 1Run 1
kompot_de_Young_to_Mid_fold_changelayersOnly in Run 1Run 1
OBS Different Fields
kompot_de_Old_stdobsOnly in Run 0Run 0
kompot_de_Mid_stdobsOnly in Run 1Run 1
VAR Different Fields
kompot_de_Young_to_Old_is_devarOnly in Run 0Run 0
kompot_de_Young_to_Old_mahalanobisvarOnly in Run 0Run 0
kompot_de_Young_to_Old_mahalanobis_local_fdrvarOnly in Run 0Run 0
kompot_de_Young_to_Old_mean_lfcvarOnly in Run 0Run 0
kompot_de_Young_to_Mid_is_devarOnly in Run 1Run 1
kompot_de_Young_to_Mid_mahalanobisvarOnly in Run 1Run 1
kompot_de_Young_to_Mid_mahalanobis_local_fdrvarOnly in Run 1Run 1
kompot_de_Young_to_Mid_mean_lfcvarOnly in Run 1Run 1
Note on shared fields: When both runs define the same field, the last run to write to the field overwrites the previous value. The 'Last Modified By' column shows which run's value is currently stored.

Reproducing and Editing Runs

Every run stores its parameters as Settings objects. Use call_args() to get a kwargs dict that can reproduce the run, or edit it first:

[19]:
kwargs = run_info.call_args()

# Tweak parameters before re-running
kwargs["fdr"].threshold = 0.01          # tighten FDR
kwargs["condition2"] = "Mid"            # different comparison
kwargs["gp"].n_landmarks = 3000         # fewer landmarks

# kompot.de(adata, **kwargs) # reproduce de run with similar parameters

Resource Planning

For large datasets, pass dry_run=True to estimate resource requirements before running the full analysis:

[19]:
plan = kompot.de(
    adata,
    groupby=GROUPING_COLUMN,
    condition1="Young",
    condition2="Old",
    layer=LAYER_FOR_EXPRESSION,
    obsm_key=DIMENSIONALITY_REDUCTION,
    fdr=kompot.FDRSettings(null_genes=2000),
    dry_run=True,
)
================================================================================
RESOURCE USAGE PLAN
================================================================================

System Resources:
  Memory: 316.82 GB available (of 754.59 GB total)
  Disk:   377.28 GB available at /tmp

Total Requirements:
  Memory: 31.76 GB (10% of available)

Memory Allocations:
  • Mellon precision matrix L (condition 1, 2,917/3,116 cells) (np.int64(2917), np.int64(2917)): 64.92 MB
  • Mellon precision matrix L (condition 2, 2,917/3,116 cells) (np.int64(3116), np.int64(3116)): 74.08 MB
  • Imputed expression (condition 1) (8090, 18285): 1.10 GB → adata.layers['kompot_de_Young_imputed']
  • Imputed expression (condition 2) (8090, 18285): 1.10 GB → adata.layers['kompot_de_Old_imputed']
  • Fold change (8090, 18285): 1.10 GB → adata.layers['kompot_de_Young_to_Old_fold_change']
  • Temporary matrices during predictions (batch_size=100) (100, 5000) + (100, 18285): 17.77 MB
  • Peak intermediate arrays during predictions (~25 arrays) 25×(8090, 18285): 27.55 GB
  • Function predictor covariances (per condition) (5000, 5000): 381.47 MB
  • Combined covariance matrix (5000, 5000): 190.73 MB
  • Cholesky decomposition (for Mahalanobis) (5000, 5000): 190.73 MB
  • Mahalanobis batch processing (batch_size=100) (100, 5000): 3.81 MB

Output Fields:
  adata.layers:
    - kompot_de_Young_imputed [OVERWRITES run_id=1]
    - kompot_de_Old_imputed [OVERWRITES run_id=0]
    - kompot_de_Young_to_Old_fold_change [OVERWRITES run_id=0]
  adata.var:
    - kompot_de_Young_to_Old_mahalanobis [OVERWRITES run_id=0]
    - kompot_de_Young_to_Old_mean_lfc [OVERWRITES run_id=0]
    - kompot_de_Young_to_Old_mahalanobis_local_fdr
    - kompot_de_Young_to_Old_is_de

Info:
  ℹ Null distribution will use 2000 additional genes (total: 18285 genes processed)
  ℹ Cell batching reduces memory: Each of 4 prediction operations uses ~17.77 MB temporary arrays instead of 1.40 GB (saving 1.39 GB).
  ℹ Prediction creates ~25 intermediate arrays of shape (8,090, 18285). These coexist at peak memory (27.55 GB) but are freed before completion.
  ℹ Mahalanobis computation processes 100 genes per batch. Reduce via gp=GPSettings(batch_size=...) to lower peak memory (currently 3.81 MB for batch arrays).

Warnings:
  ⚠ Results with result_key='kompot_de' already exist (run_id=1). Previous run: 2026-03-26T05:39:38.441389 comparing Young to Mid (null_genes=2000). Fields that will be overwritten: var.kompot_de_Young_to_Old_mahalanobis, var.kompot_de_Young_to_Old_mean_lfc, layers.kompot_de_Young_imputed, layers.kompot_de_Old_imputed, layers.kompot_de_Young_to_Old_fold_change and 2 more

================================================================================
STATUS: ⚠ FEASIBLE WITH WARNINGS - Proceed with caution
================================================================================

The report shows:

  • Available system memory and disk space

  • Estimated memory requirements for each computation step

  • Which fields will be created or overwritten (with run_id)

  • Whether the analysis is feasible

This is especially useful for testing different parameter combinations (e.g., with/without sample variance) before committing to a long computation.

Saving Results

Selective Cleanup

Remove imputed expression layers while preserving statistical results:

# This keeps adata.var statistics but removes large adata.layers kompot.cleanup(adata)

To clean up specific runs only:

# Clean up only the first run kompot.cleanup(adata, run_ids=0, analysis_type="de")adata.write_h5ad("../data/murine_bone_marrow_aging_processed.h5ad")

Summary

This tutorial covered:

✓ Customizing DE parameters (null_genes, sigma, batch_size)
✓ Advanced volcano plot options
✓ Expression visualization techniques
✓ Heatmap customization
✓ Managing multiple comparisons with run_id
✓ Resource planning with dry runs

Next Steps