Differential Expression: Advanced Analysis¶

This tutorial explores advanced differential expression (DE) analysis with Kompot, building on the Getting Started tutorial.

You’ll learn how to:

Customize DE analysis parameters for your specific dataset
Work with null gene distributions for FDR estimation
Perform multiple comparisons and track results
Use advanced visualization options
Optimize computational resources

Setup¶

[1]:

import anndata as ad
import matplotlib.pyplot as plt
import numpy as np
import palantir
import pandas as pd
import scanpy as sc

import kompot

plt.rcParams["axes.spines.right"] = False
plt.rcParams["axes.spines.top"] = False
plt.rcParams["image.cmap"] = "Spectral_r"

[2]:

DATA_PATH = "../data/murine_bone_marrow_aging.h5ad"
GROUPING_COLUMN = "Age"
CONDITIONS = ["Young", "Old"]
CELL_TYPE_COLUMN = "highres_celltype"
DIMENSIONALITY_REDUCTION = "DM_EigenVectors"
LAYER_FOR_EXPRESSION = "logged_counts"

Load and Prepare Data¶

We’ll reuse the dataset from the first tutorial:

[3]:

adata = ad.read_h5ad(DATA_PATH)
palantir.utils.run_diffusion_maps(adata, pca_key="X_pca_harmony", n_components=40)
adata

[3]:

AnnData object with n_obs × n_vars = 8090 × 16285
    obs: 'Compartment', 'Replicate', 'Age', 'Sample', 'Info', 'batch', 'doublet_score', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_hb', 'pct_counts_hb', 'S_score', 'G2M_score', 'phase', 'leiden', 'phenograph', 'highres_celltype', 'midres_celltype'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'hb', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'Age_colors', 'Compartment_colors', 'DMEigenValues', 'Info_colors', 'README', 'Replicate_colors', 'Sample_colors', 'batch_colors', 'draw_graph', 'highres_celltype_colors', 'hvg', 'leiden', 'leiden_colors', 'midres_celltype_colors', 'neighbors', 'pca', 'phase_colors', 'umap', 'DM_EigenValues'
    obsm: 'AbCapture', 'DM_EigenVectors', 'HTO', 'X_draw_graph_fa', 'X_pca', 'X_pca_harmony', 'X_pca_noregression', 'X_umap'
    varm: 'PCs'
    layers: 'MAGIC_imputed_data', 'logged_counts', 'normalized_counts', 'raw_counts'
    obsp: 'DM_Kernel', 'connectivities', 'distances', 'DM_Similarity'

Understanding DE Parameters¶

The compute_differential_expression function provides several important parameters:

Core Parameters¶

``null_genes``: Number of permuted genes for FDR estimation (default: 2000)
- Higher values give better FDR estimates but increase computation time
- Set to 0 to disable FDR computation (faster, but no significance thresholds)
``sigma``: Noise level in the expression layer (default: 1)
- Adjust based on your normalization method
- Higher values for noisier data
- Lower values for denoised data
``batch_size``: Process genes/cells in batches to reduce memory (default: 0 = no batching)
- Set to ~100 for large datasets to prevent memory overflow

Advanced Parameters¶

``n_landmarks``: Number of landmark points for Mahalanobis computation (default: 5000)
- More landmarks = more accurate but slower
- Kompot automatically uses min(n_landmarks, n_cells)
``result_key``: Prefix for result field names (default: “kompot_de”)
- Change to avoid overwriting previous results

Let’s run DE analysis with customized parameters:

[4]:

de_results = kompot.compute_differential_expression(
    adata,
    groupby=GROUPING_COLUMN,
    condition1=CONDITIONS[0],
    condition2=CONDITIONS[1],
    layer=LAYER_FOR_EXPRESSION,
    obsm_key=DIMENSIONALITY_REDUCTION,
    null_genes=4000,  # More null genes for better FDR estimation
    sigma=1,          # Adjust if needed for your data
    batch_size=0,     # No batching for this small dataset
)

[2025-10-03 13:21:59,246] [INFO    ] Condition 1 (Young): 2,917 cells
[2025-10-03 13:21:59,248] [INFO    ] Condition 2 (Old): 3,116 cells
[2025-10-03 13:21:59,248] [INFO    ] Using 8090 of 8090 cells (100.0%)
[2025-10-03 13:21:59,602] [INFO    ] Preparing null distribution with null_genes=4000, null_seed=42
[2025-10-03 13:22:01,972] [INFO    ] Generated shuffled expression for 4000 null genes

WARNING:2025-10-03 13:22:02,421:jax._src.xla_bridge:966: An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.

[2025-10-03 13:22:12,369] [INFO    ] Fitting expression estimator for condition 1...
[2025-10-03 13:22:47,608] [INFO    ] Fitting expression estimator for condition 2...
[2025-10-03 13:23:46,830] [INFO    ] Landmark storage skipped (store_landmarks=False). Compute with store_landmarks=True to enable landmark reuse.
[2025-10-03 13:24:56,798] [INFO    ] Using 5,000 landmarks for Mahalanobis computation
[2025-10-03 13:25:57,300] [INFO    ] Computing FDR statistics from null distribution
[2025-10-03 13:26:00,149] [INFO    ] FDR analysis complete: 139/16285 genes significantly DE at FDR < 0.05
[2025-10-03 13:26:00,150] [INFO    ] Mahalanobis distance threshold for FDR < 0.05: 29.0322
[2025-10-03 13:26:08,353] [INFO    ] This run will have `run_id=0`.

Volcano Plot Customization¶

The volcano_de function offers extensive customization options.

Changing the Y-Axis Metric¶

By default, the y-axis shows Mahalanobis distance. You can switch to local FDR:

[5]:

kompot.plot.volcano_de(
    adata,
    y_axis_type="local_fdr",
    significance_threshold=0.02,  # Adjust FDR threshold
)

[2025-10-03 13:26:08,807] [INFO    ] Found DE run info for run_id=-1
[2025-10-03 13:26:08,810] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:26:08,810] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:08,811] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:08,817] [INFO    ] Using local_fdr values for y-axis: kompot_de_Young_to_Old_mahalanobis_local_fdr
[2025-10-03 13:26:08,818] [INFO    ] Using DE run 0: comparing Young to Old
[2025-10-03 13:26:08,865] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis_local_fdr'
[2025-10-03 13:26:08,866] [INFO    ] Applied local_fdr transformation to y-axis data
[2025-10-03 13:26:08,867] [INFO    ] Significance threshold selection: using column 'kompot_de_Young_to_Old_mahalanobis_local_fdr' with threshold < 0.02
[2025-10-03 13:26:08,873] [INFO    ] Values range: 0.018455 - 1.000000
[2025-10-03 13:26:08,875] [INFO    ] Found 110 genes with local_fdr < 0.02
[2025-10-03 13:26:08,877] [INFO    ] Highlighting 110 genes at local_fdr < 0.02 (66 up, 44 down)
[2025-10-03 13:26:08,899] [INFO    ] Labeling top 10 genes by score
[2025-10-03 13:26:08,920] [INFO    ] Added local_fdr threshold line at y=1.70 (local_fdr=0.02)

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_8_1.png

Highlighting Gene Sets¶

Highlight specific gene sets with custom colors and labels:

[6]:

# Define gene sets of interest
gene_sets = [
    {
        "name": "MHC class II",
        "genes": ["H2-Ab1", "H2-Aa", "Cd74", "H2-Eb1"],
        "color": "#E76F51",
    },
    {
        "name": "Antioxidant",
        "genes": ["S100a8", "Alox5ap", "Hp", "S100a9", "Mgst1", "Apoe"],
        "color": "#2A9D8F",
    },
]

kompot.plot.volcano_de(
    adata,
    significance_threshold={"local_fdr": 0.02, "mahalanobis": 30},
    gene_labels=["S100a8", "Alox5ap", "Hp", "S100a9", "Mgst1", "Apoe"],
    highlight_genes=gene_sets,
)

[2025-10-03 13:26:09,313] [INFO    ] Found DE run info for run_id=-1
[2025-10-03 13:26:09,314] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:26:09,314] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:09,315] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:09,315] [INFO    ] Using DE run 0: comparing Young to Old
[2025-10-03 13:26:09,348] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis'
[2025-10-03 13:26:09,350] [INFO    ] Added highlight group 'MHC class II' with 4 genes
[2025-10-03 13:26:09,350] [INFO    ] Added highlight group 'Antioxidant' with 6 genes
[2025-10-03 13:26:09,358] [INFO    ] Labeling 6 specific genes
[2025-10-03 13:26:09,387] [INFO    ] Skipping threshold line drawing for dictionary-format significance_threshold

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_10_1.png

gene_labels can be:

A list of gene names
A dictionary mapping gene names to custom labels
An integer to auto-select the top N genes

highlight_genes can be:

A list of gene names (all same color)
A list of dictionaries with “name”, “genes”, and “color” keys (as shown)
An integer to highlight the top N genes

Expression Visualization¶

Plotting Individual Genes¶

Imputed expression for each condition is stored in adata.layers:

kompot_de_Young_imputed
kompot_de_Old_imputed
kompot_de_Young_to_Old_fold_change

You can plot these manually:

gene = "Igkc"

# Manual plotting
sc.pl.embedding(adata, basis="umap", color=gene, layer="logged_counts")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Young_imputed")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Old_imputed")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Young_to_Old_fold_change")

Or use the convenience function:

[7]:

kompot.plot.plot_gene_expression(
    adata, gene="Igkc", vmin="p2", vmax="p98", frameon=False
)

[2025-10-03 13:26:09,773] [INFO    ] Found DE run info for run_id=-1
[2025-10-03 13:26:09,773] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:26:09,774] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:09,775] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:09,775] [INFO    ] Using DE run 0: comparing Young to Old
[2025-10-03 13:26:09,776] [INFO    ] Using fields for gene expression plot - lfc_key: 'kompot_de_Young_to_Old_mean_lfc', score_key: 'kompot_de_Young_to_Old_mahalanobis'
[2025-10-03 13:26:09,776] [INFO    ] Using layer 'logged_counts' inferred from run information
[2025-10-03 13:26:09,778] [INFO    ] Using condition1 imputed layer 'kompot_de_Young_imputed' for 'Young'
[2025-10-03 13:26:09,778] [INFO    ] Using condition2 imputed layer 'kompot_de_Old_imputed' for 'Old'
[2025-10-03 13:26:09,778] [INFO    ] Using fold_change layer 'kompot_de_Young_to_Old_fold_change' from run_info

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_13_1.png

Comparing Conditions with Subplots¶

The kompot.plot.embedding wrapper supports filtering to specific groups:

[8]:

kompot.plot.embedding(
    adata,
    "umap",
    color="Igkc",
    layer=LAYER_FOR_EXPRESSION,
    frameon=False,
    mgroups=[{GROUPING_COLUMN: condition} for condition in CONDITIONS],
)

[2025-10-03 13:26:11,413] [INFO    ] Selected 2,917 cells out of 8,090 total cells.
[2025-10-03 13:26:11,701] [INFO    ] Selected 3,116 cells out of 8,090 total cells.

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_15_1.png

The mgroups parameter creates multiple subplots, each showing cells filtered by the specified conditions.

Heatmap Customization¶

The heatmap function visualizes average expression per group.

Z-Score Normalization¶

By default, values are z-scored across conditions for each gene. Disable this to show raw expression:

[9]:

kompot.plot.heatmap(
    adata,
    n_top_genes=20,
    groupby=CELL_TYPE_COLUMN,
    exclude_groups="Plasma cell",
    standard_scale=None,  # Disable z-scoring
    vmax="p99",
)

[2025-10-03 13:26:12,648] [INFO    ] Found DE run info for run_id=-1
[2025-10-03 13:26:12,654] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:12,654] [INFO    ] Successfully inferred fields: {'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:12,655] [INFO    ] Using DE run 0 for heatmap.
[2025-10-03 13:26:12,655] [INFO    ] Inferred score_key='kompot_de_Young_to_Old_mahalanobis' from run information
[2025-10-03 13:26:12,658] [INFO    ] Inferred condition_column='Age' from run information
[2025-10-03 13:26:12,659] [INFO    ] Inferred condition1='Young' from run information
[2025-10-03 13:26:12,659] [INFO    ] Inferred condition2='Old' from run information
[2025-10-03 13:26:12,660] [INFO    ] Inferred layer='logged_counts' from run information
[2025-10-03 13:26:12,660] [INFO    ] Creating split heatmap with 20 genes/features
[2025-10-03 13:26:12,661] [INFO    ] Using expression data from layer: 'logged_counts'
[2025-10-03 13:26:12,845] [INFO    ] Excluded 7 cells from groups: Plasma cell

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_18_1.png

Custom Gene Lists¶

Instead of selecting top genes by Mahalanobis distance, provide a custom list:

[10]:

custom_genes = ["H2-Q7", "Cd74", "H2-Aa", "H2-Ab1", "S100a9", "S100a8", "Apoe"]

kompot.plot.heatmap(
    adata,
    genes=custom_genes,
    groupby=CELL_TYPE_COLUMN,
    exclude_groups="Plasma cell",
    vmin="p1",
    vmax="p99",
)

[2025-10-03 13:26:15,305] [INFO    ] Inferred condition_column='Age' from run information
[2025-10-03 13:26:15,306] [INFO    ] Inferred condition1='Young' from run information
[2025-10-03 13:26:15,307] [INFO    ] Inferred condition2='Old' from run information
[2025-10-03 13:26:15,307] [INFO    ] Inferred layer='logged_counts' from run information
[2025-10-03 13:26:15,308] [INFO    ] Creating split heatmap with 7 genes/features
[2025-10-03 13:26:15,308] [INFO    ] Using expression data from layer: 'logged_counts'
[2025-10-03 13:26:15,371] [INFO    ] Excluded 7 cells from groups: Plasma cell
[2025-10-03 13:26:15,377] [INFO    ] Applying gene-wise z-scoring (standard_scale='var')

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_20_1.png

Simplify Heatmap Display¶

To simplify the plot, pass fold_change_mode=True. Each square will then represent only the difference between the two groups, which equals the log fold change if the expression data was log-transformed.

[11]:

kompot.plot.heatmap(
    adata,
    genes=custom_genes,
    groupby=CELL_TYPE_COLUMN,
    exclude_groups="Plasma cell",
    vmin="p1",
    vmax="p99",
    fold_change_mode=True,
)

[2025-10-03 13:26:16,229] [INFO    ] Inferred condition_column='Age' from run information
[2025-10-03 13:26:16,232] [INFO    ] Inferred condition1='Young' from run information
[2025-10-03 13:26:16,232] [INFO    ] Inferred condition2='Old' from run information
[2025-10-03 13:26:16,233] [INFO    ] Inferred layer='logged_counts' from run information
[2025-10-03 13:26:16,233] [INFO    ] Creating fold change heatmap with 7 genes/features
[2025-10-03 13:26:16,234] [INFO    ] Using expression data from layer: 'logged_counts'
[2025-10-03 13:26:16,343] [INFO    ] Excluded 7 cells from groups: Plasma cell
[2025-10-03 13:26:16,349] [INFO    ] Applying gene-wise z-scoring (standard_scale='var')
[2025-10-03 13:26:16,420] [WARNING ] standard_scale is ignored in fold_change_mode as z-scoring is not appropriate for fold changes

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_22_1.png

Multiple Comparisons¶

Kompot tracks multiple analysis runs, allowing you to perform different comparisons on the same dataset.

Running Additional Comparisons¶

Let’s add a comparison between Young and Mid-age mice (if Mid exists in your data):

[12]:

de_results_2 = kompot.compute_differential_expression(
    adata,
    groupby=GROUPING_COLUMN,
    condition1="Young",
    condition2="Mid",
    layer=LAYER_FOR_EXPRESSION,
    obsm_key=DIMENSIONALITY_REDUCTION,
    batch_size=0,
)

[2025-10-03 13:26:17,562] [WARNING ] Differential expression results with result_key='kompot_de' already exist in the dataset. Previous run was at 2025-10-03T13:26:08.352416 comparing Young to Old. Fields that will be overwritten: layers.kompot_de_Young_imputed, obs.kompot_de_Young_std Set overwrite=False to prevent overwriting or overwrite=True to silence this message.
[2025-10-03 13:26:17,566] [INFO    ] Condition 1 (Young): 2,917 cells
[2025-10-03 13:26:17,566] [INFO    ] Condition 2 (Mid): 2,057 cells
[2025-10-03 13:26:17,567] [INFO    ] Using 8090 of 8090 cells (100.0%)
[2025-10-03 13:26:17,828] [INFO    ] Preparing null distribution with null_genes=2000, null_seed=42
[2025-10-03 13:26:19,100] [INFO    ] Generated shuffled expression for 2000 null genes
[2025-10-03 13:26:19,578] [INFO    ] Fitting expression estimator for condition 1...
[2025-10-03 13:26:21,014] [WARNING ] Gaussin process type is GaussianProcessType.FIXED and n_landmarks=5,000 are requested while only 2,917 datapoints are available. Using all datapoints for 2,917 landmarks instead.
[2025-10-03 13:26:32,556] [INFO    ] Fitting expression estimator for condition 2...
[2025-10-03 13:26:46,811] [WARNING ] Gaussin process type is GaussianProcessType.FIXED and n_landmarks=5,000 are requested while only 2,057 datapoints are available. Using all datapoints for 2,057 landmarks instead.
[2025-10-03 13:27:15,798] [INFO    ] Using 2,917 landmarks for Mahalanobis computation
[2025-10-03 13:27:27,143] [INFO    ] Computing FDR statistics from null distribution
[2025-10-03 13:27:28,213] [INFO    ] FDR analysis complete: 0/16285 genes significantly DE at FDR < 0.05
[2025-10-03 13:27:33,336] [INFO    ] This run will have `run_id=1`.

Kompot warns about field overwrites and tracks which run created each field.

Accessing Different Runs¶

Each analysis is assigned a run_id (0 for the first run, 1 for the second, etc.). Most plotting functions accept a run_id parameter:

[13]:

# Plot the first comparison (Young vs Old)
kompot.plot.volcano_de(adata, run_id=0, title="Young vs Old")

[2025-10-03 13:27:33,821] [INFO    ] Found DE run info for run_id=0
[2025-10-03 13:27:33,823] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:27:33,824] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:27:33,825] [WARNING ] Field 'kompot_de_Young_to_Old_mean_lfc' was last written by run 0, but current context expects run 1. The field may have been overwritten.
[2025-10-03 13:27:33,826] [WARNING ] Field 'kompot_de_Young_to_Old_mahalanobis' was last written by run 0, but current context expects run 1. The field may have been overwritten.
[2025-10-03 13:27:33,828] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:27:33,828] [WARNING ] Field inference completed with 2 warnings
[2025-10-03 13:27:33,829] [INFO    ] Using DE run 0: comparing Young to Old
[2025-10-03 13:27:33,851] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis'
[2025-10-03 13:27:33,857] [INFO    ] Highlighting 139 genes marked as DE (77 up, 62 down)
[2025-10-03 13:27:33,872] [INFO    ] Labeling top 10 genes by score

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_26_1.png

[14]:

# Plot the second comparison (Young vs Mid)
kompot.plot.volcano_de(adata, run_id=1, title="Young vs Mid")

[2025-10-03 13:27:34,235] [INFO    ] Found DE run info for run_id=1
[2025-10-03 13:27:34,236] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Mid_mean_lfc' from run info
[2025-10-03 13:27:34,236] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Mid_mahalanobis' from run info
[2025-10-03 13:27:34,237] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Mid_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Mid_mahalanobis'}
[2025-10-03 13:27:34,238] [INFO    ] Using DE run 1: comparing Young to Mid
[2025-10-03 13:27:34,256] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Mid_mean_lfc', score: 'kompot_de_Young_to_Mid_mahalanobis'
[2025-10-03 13:27:34,258] [INFO    ] No genes marked as DE found - falling back to top genes highlighting
[2025-10-03 13:27:34,260] [INFO    ] Highlighting top 10 genes by kompot_de_Young_to_Mid_mahalanobis as fallback
[2025-10-03 13:27:34,276] [INFO    ] Labeling top 10 genes by score

../../_build/doctrees/nbsphinx/notebooks_02_differential_expression_detailed_27_1.png

You can also use negative indexing (run_id=-1 for the most recent run).

Inspecting Run Metadata¶

The RunInfo utility provides detailed information about each analysis. Here we inspect the first run to see that some data has been overwritten by the most recent one:

[16]:

run_info = kompot.RunInfo(adata, run_id=0, analysis_type="de")
run_info

[16]:

Run 0 (DE Analysis)

Run Summary

Analysis: DE | Run ID: 0 | Timestamp: 2025-10-03T21:11:48.731385 | Conditions: Young to OldFields Overwritten: 2

Parameter	Value
conditions	Young to Old
obsm_key	DM_EigenVectors
uses_sample_variance	False
layer	logged_counts
timestamp	2025-10-03T21:11:48.731385
Fields Created	9

All Parameters

Parameter	Value
auto_filtered	False
batch_size	0
compute_mahalanobis	True
condition1	Young
condition2	Old
copy	False
eps	1e-08
fdr_threshold	0.05
groupby	Age
inplace	True
jit_compile	False
landmarks	False
layer	logged_counts
ls_factor	10.0
max_memory_ratio	0.8
min_cells	2
n_landmarks	5000
null_genes	4000
null_seed	42
obsm_key	DM_EigenVectors
result_key	kompot_de
sigma	1
store_landmarks	False
store_posterior_covariance	False
use_sample_variance	False
used_landmarks	False

Environment

Parameter	Value
hostname	gizmok39
pid	11250
platform	Linux-4.15.0-213-generic-x86_64-with-glibc2.27
python_version	3.12.10
timestamp	2025-10-03T21:11:48.731574
username	dotto

Fields Created by This Run

Total Fields: 9 | Present: 7 | Missing: 0 | Overwritten: 2

Field Name	Location	Description	Status
LAYERS Fields
kompot_de_Old_imputed	layers	[imputed] Imputed expression for Old	Present
kompot_de_Young_imputed	layers	[imputed] Imputed expression for Young	Overwritten by Run 1
kompot_de_Young_to_Old_fold_change	layers	[fold_change] Log fold change for each cell and gene	Present
OBS Fields
kompot_de_Old_std	obs	[std] Posterior standard deviation of imputed expression for Old (same for all genes)	Present
kompot_de_Young_std	obs	[std] Posterior standard deviation of imputed expression for Young (same for all genes)	Overwritten by Run 1
VAR Fields
kompot_de_Young_to_Old_is_de	var	[is_de] Boolean indicator of differential expression at local FDR < 0.05	Present
kompot_de_Young_to_Old_mahalanobis	var	[mahalanobis] Mahalanobis distances	Present
kompot_de_Young_to_Old_mahalanobis_local_fdr	var	[mahalanobis_local_fdr] Local FDR values using empirical null estimation similar to R's fdrtool	Present
kompot_de_Young_to_Old_mean_lfc	var	[mean_log_fold_change] Mean log fold change values	Present

Comparing Runs¶

Compare parameters between two runs:

[17]:

kompot.RunInfo(adata, run_id=0, analysis_type="de").compare_with(1)

[17]:

Comparison of Run 0 and Run 1

Summary

Analysis Type: DE | Run 0: unknown (2025-10-03T21:11:48.731385) | Run 1: unknown (2025-10-03T21:12:39.156717)

Parameters: 2 different, 35 sameFields: 14 different, 2 same

Aspect	Run Details
Aspect	Run 0	Run 1
conditions	Young to Old	Young to Mid
result_key	kompot_de	kompot_de
uses_sample_variance	False	False
timestamp	2025-10-03T21:11:48.731385	2025-10-03T21:12:39.156717
Field Count	9	9

Parameter Differences

Total Parameters: 37 | Different: 2 | Only in Run 0: 0 | Only in Run 1: 0

Different Conditions

Key Parameter Differences

Parameter	Run 0	Run 1
condition2	Old	Mid

All Parameter Differences

Parameter	Run 0	Run 1
Different Parameters
null_genes	4000	2000
35 parameters are the same in both runs

Field Differences

Fields in both runs: 2 | Only in Run 0: 7 | Only in Run 1: 7

Field Name	Location	Status	Last Modified By
LAYERS Shared Fields
kompot_de_Young_imputed	layers	Current value from Run 1	Run 1
OBS Shared Fields
kompot_de_Young_std	obs	Current value from Run 1	Run 1
LAYERS Different Fields
kompot_de_Old_imputed	layers	Only in Run 0	Run 0
kompot_de_Young_to_Old_fold_change	layers	Only in Run 0	Run 0
kompot_de_Mid_imputed	layers	Only in Run 1	Run 1
kompot_de_Young_to_Mid_fold_change	layers	Only in Run 1	Run 1
OBS Different Fields
kompot_de_Old_std	obs	Only in Run 0	Run 0
kompot_de_Mid_std	obs	Only in Run 1	Run 1
VAR Different Fields
kompot_de_Young_to_Old_is_de	var	Only in Run 0	Run 0
kompot_de_Young_to_Old_mahalanobis	var	Only in Run 0	Run 0
kompot_de_Young_to_Old_mahalanobis_local_fdr	var	Only in Run 0	Run 0
kompot_de_Young_to_Old_mean_lfc	var	Only in Run 0	Run 0
kompot_de_Young_to_Mid_is_de	var	Only in Run 1	Run 1
kompot_de_Young_to_Mid_mahalanobis	var	Only in Run 1	Run 1
kompot_de_Young_to_Mid_mahalanobis_local_fdr	var	Only in Run 1	Run 1
kompot_de_Young_to_Mid_mean_lfc	var	Only in Run 1	Run 1

Note on shared fields: When both runs define the same field, the last run to write to the field overwrites the previous value. The 'Last Modified By' column shows which run's value is currently stored.

Resource Planning¶

For large datasets, use dry_run_differential_expression to estimate resource requirements before running the full analysis:

[24]:

plan = kompot.dry_run_differential_expression(
    adata,
    groupby=GROUPING_COLUMN,
    condition1="Young",
    condition2="Old",
    layer=LAYER_FOR_EXPRESSION,
    obsm_key=DIMENSIONALITY_REDUCTION,
    null_genes=2000,
    verbose=True,
)

================================================================================
RESOURCE USAGE PLAN
================================================================================

System Resources:
  Memory: 382.86 GB available (of 754.59 GB total)
  Disk:   37.83 GB available at /tmp

Total Requirements:
  Memory: 4.19 GB (1% of available)

Memory Allocations:
  • Mellon precision matrix L (condition 1, 2,917/3,116 cells) (2917, 2917): 64.92 MB
  • Mellon precision matrix L (condition 2, 2,917/3,116 cells) (3116, 3116): 74.08 MB
  • Imputed expression (condition 1) (8090, 18285): 1.10 GB → adata.layers['kompot_de_Young_imputed']
  • Imputed expression (condition 2) (8090, 18285): 1.10 GB → adata.layers['kompot_de_Old_imputed']
  • Fold change (8090, 18285): 1.10 GB → adata.layers['kompot_de_Young_to_Old_fold_change']
  • Function predictor covariances (per condition) (5000, 5000): 381.47 MB
  • Combined covariance matrix (5000, 5000): 190.73 MB
  • Cholesky decomposition (for Mahalanobis) (5000, 5000): 190.73 MB
  • Mahalanobis batch processing (batch_size=100) (100, 5000): 3.81 MB

Output Fields:
  adata.layers:
    - kompot_de_Young_imputed
    - kompot_de_Old_imputed
    - kompot_de_Young_to_Old_fold_change
  adata.var:
    - kompot_de_Young_to_Old_mahalanobis [OVERWRITES run_id=0]
    - kompot_de_Young_to_Old_mean_lfc [OVERWRITES run_id=0]
    - kompot_de_Young_to_Old_mahalanobis_local_fdr
    - kompot_de_Young_to_Old_is_de

Info:
  ℹ Null distribution will use 2000 additional genes (total: 18285 genes processed)
  ℹ Mahalanobis computation processes 100 genes per batch. Reduce batch_size to lower peak memory (currently 3.81 MB for batch arrays).

Warnings:
  ⚠ Results with result_key='kompot_de' already exist (run_id=1). Previous run: 2025-10-03T14:50:36.145559 comparing Young to Mid (null_genes=2000). Fields that will be overwritten: var.kompot_de_Young_to_Old_mahalanobis, var.kompot_de_Young_to_Old_mean_lfc, obs.kompot_de_Young_std, obs.kompot_de_Old_std

================================================================================
STATUS: ⚠ FEASIBLE WITH WARNINGS - Proceed with caution
================================================================================

The report shows:

Available system memory and disk space
Estimated memory requirements for each computation step
Which fields will be created or overwritten (with run_id)
Whether the analysis is feasible

This is especially useful for testing different parameter combinations (e.g., with/without sample variance) before committing to a long computation.

Saving Results¶

Selective Cleanup¶

Remove imputed expression layers while preserving statistical results:

[18]:

# This keeps adata.var statistics but removes large adata.layers
kompot.cleanup(adata)

[2025-10-03 13:27:34,746] [INFO    ] Cleaning up all 2 run(s)
[2025-10-03 13:27:34,760] [INFO    ] Cleaned up 3 field(s) from run 0:
[2025-10-03 13:27:34,761] [INFO    ]   layers (3 field(s)):
[2025-10-03 13:27:34,761] [INFO    ]     - kompot_de_Young_imputed
[2025-10-03 13:27:34,762] [INFO    ]     - kompot_de_Old_imputed
[2025-10-03 13:27:34,762] [INFO    ]     - kompot_de_Young_to_Old_fold_change
[2025-10-03 13:27:34,770] [INFO    ] Cleaned up 2 field(s) from run 1:
[2025-10-03 13:27:34,771] [INFO    ]   layers (2 field(s)):
[2025-10-03 13:27:34,771] [INFO    ]     - kompot_de_Mid_imputed
[2025-10-03 13:27:34,772] [INFO    ]     - kompot_de_Young_to_Mid_fold_change
[2025-10-03 13:27:34,772] [INFO    ] Total: Cleaned up 5 field(s) across 2 run(s)

To clean up specific runs only:

[20]:

# Clean up only the first run
kompot.cleanup(adata, run_ids=0, analysis_type="de")

[2025-10-03 13:28:06,698] [INFO    ] No fields deleted.

[21]:

adata.write_h5ad("../data/murine_bone_marrow_aging_processed.h5ad")

Summary¶

This tutorial covered:

✓ Customizing DE parameters (null_genes, sigma, batch_size)
✓ Advanced volcano plot options
✓ Expression visualization techniques
✓ Heatmap customization
✓ Managing multiple comparisons with run_id
✓ Resource planning with dry runs

Next Steps¶

Tutorial 3: Sample Variance Analysis - Account for biological replicates
API Documentation: kompot.readthedocs.io