Differential Expression: Advanced Analysis¶
This tutorial explores advanced differential expression (DE) analysis with Kompot, building on the Getting Started tutorial.
You’ll learn how to:
Customize DE analysis parameters for your specific dataset
Work with null gene distributions for FDR estimation
Perform multiple comparisons and track results
Use advanced visualization options
Optimize computational resources
Setup¶
[1]:
import anndata as ad
import matplotlib.pyplot as plt
import numpy as np
import palantir
import pandas as pd
import scanpy as sc
import kompot
plt.rcParams["axes.spines.right"] = False
plt.rcParams["axes.spines.top"] = False
plt.rcParams["image.cmap"] = "Spectral_r"
[2]:
DATA_PATH = "../data/murine_bone_marrow_aging.h5ad"
GROUPING_COLUMN = "Age"
CONDITIONS = ["Young", "Old"]
CELL_TYPE_COLUMN = "highres_celltype"
DIMENSIONALITY_REDUCTION = "DM_EigenVectors"
LAYER_FOR_EXPRESSION = "logged_counts"
Load and Prepare Data¶
We’ll reuse the dataset from the first tutorial:
[3]:
adata = ad.read_h5ad(DATA_PATH)
palantir.utils.run_diffusion_maps(adata, pca_key="X_pca_harmony", n_components=40)
adata
[3]:
AnnData object with n_obs × n_vars = 8090 × 16285
obs: 'Compartment', 'Replicate', 'Age', 'Sample', 'Info', 'batch', 'doublet_score', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_hb', 'pct_counts_hb', 'S_score', 'G2M_score', 'phase', 'leiden', 'phenograph', 'highres_celltype', 'midres_celltype'
var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'hb', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
uns: 'Age_colors', 'Compartment_colors', 'DMEigenValues', 'Info_colors', 'README', 'Replicate_colors', 'Sample_colors', 'batch_colors', 'draw_graph', 'highres_celltype_colors', 'hvg', 'leiden', 'leiden_colors', 'midres_celltype_colors', 'neighbors', 'pca', 'phase_colors', 'umap', 'DM_EigenValues'
obsm: 'AbCapture', 'DM_EigenVectors', 'HTO', 'X_draw_graph_fa', 'X_pca', 'X_pca_harmony', 'X_pca_noregression', 'X_umap'
varm: 'PCs'
layers: 'MAGIC_imputed_data', 'logged_counts', 'normalized_counts', 'raw_counts'
obsp: 'DM_Kernel', 'connectivities', 'distances', 'DM_Similarity'
Understanding DE Parameters¶
The compute_differential_expression function provides several important parameters:
Core Parameters¶
``null_genes``: Number of permuted genes for FDR estimation (default: 2000)
Higher values give better FDR estimates but increase computation time
Set to 0 to disable FDR computation (faster, but no significance thresholds)
``sigma``: Noise level in the expression layer (default: 1)
Adjust based on your normalization method
Higher values for noisier data
Lower values for denoised data
``batch_size``: Process genes/cells in batches to reduce memory (default: 0 = no batching)
Set to ~100 for large datasets to prevent memory overflow
Advanced Parameters¶
``n_landmarks``: Number of landmark points for Mahalanobis computation (default: 5000)
More landmarks = more accurate but slower
Kompot automatically uses min(n_landmarks, n_cells)
``result_key``: Prefix for result field names (default: “kompot_de”)
Change to avoid overwriting previous results
Let’s run DE analysis with customized parameters:
[4]:
de_results = kompot.compute_differential_expression(
adata,
groupby=GROUPING_COLUMN,
condition1=CONDITIONS[0],
condition2=CONDITIONS[1],
layer=LAYER_FOR_EXPRESSION,
obsm_key=DIMENSIONALITY_REDUCTION,
null_genes=4000, # More null genes for better FDR estimation
sigma=1, # Adjust if needed for your data
batch_size=0, # No batching for this small dataset
)
[2025-10-03 13:21:59,246] [INFO ] Condition 1 (Young): 2,917 cells
[2025-10-03 13:21:59,248] [INFO ] Condition 2 (Old): 3,116 cells
[2025-10-03 13:21:59,248] [INFO ] Using 8090 of 8090 cells (100.0%)
[2025-10-03 13:21:59,602] [INFO ] Preparing null distribution with null_genes=4000, null_seed=42
[2025-10-03 13:22:01,972] [INFO ] Generated shuffled expression for 4000 null genes
WARNING:2025-10-03 13:22:02,421:jax._src.xla_bridge:966: An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
[2025-10-03 13:22:12,369] [INFO ] Fitting expression estimator for condition 1...
[2025-10-03 13:22:47,608] [INFO ] Fitting expression estimator for condition 2...
[2025-10-03 13:23:46,830] [INFO ] Landmark storage skipped (store_landmarks=False). Compute with store_landmarks=True to enable landmark reuse.
[2025-10-03 13:24:56,798] [INFO ] Using 5,000 landmarks for Mahalanobis computation
[2025-10-03 13:25:57,300] [INFO ] Computing FDR statistics from null distribution
[2025-10-03 13:26:00,149] [INFO ] FDR analysis complete: 139/16285 genes significantly DE at FDR < 0.05
[2025-10-03 13:26:00,150] [INFO ] Mahalanobis distance threshold for FDR < 0.05: 29.0322
[2025-10-03 13:26:08,353] [INFO ] This run will have `run_id=0`.
Volcano Plot Customization¶
The volcano_de function offers extensive customization options.
Changing the Y-Axis Metric¶
By default, the y-axis shows Mahalanobis distance. You can switch to local FDR:
[5]:
kompot.plot.volcano_de(
adata,
y_axis_type="local_fdr",
significance_threshold=0.02, # Adjust FDR threshold
)
[2025-10-03 13:26:08,807] [INFO ] Found DE run info for run_id=-1
[2025-10-03 13:26:08,810] [INFO ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:26:08,810] [INFO ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:08,811] [INFO ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:08,817] [INFO ] Using local_fdr values for y-axis: kompot_de_Young_to_Old_mahalanobis_local_fdr
[2025-10-03 13:26:08,818] [INFO ] Using DE run 0: comparing Young to Old
[2025-10-03 13:26:08,865] [INFO ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis_local_fdr'
[2025-10-03 13:26:08,866] [INFO ] Applied local_fdr transformation to y-axis data
[2025-10-03 13:26:08,867] [INFO ] Significance threshold selection: using column 'kompot_de_Young_to_Old_mahalanobis_local_fdr' with threshold < 0.02
[2025-10-03 13:26:08,873] [INFO ] Values range: 0.018455 - 1.000000
[2025-10-03 13:26:08,875] [INFO ] Found 110 genes with local_fdr < 0.02
[2025-10-03 13:26:08,877] [INFO ] Highlighting 110 genes at local_fdr < 0.02 (66 up, 44 down)
[2025-10-03 13:26:08,899] [INFO ] Labeling top 10 genes by score
[2025-10-03 13:26:08,920] [INFO ] Added local_fdr threshold line at y=1.70 (local_fdr=0.02)
Highlighting Gene Sets¶
Highlight specific gene sets with custom colors and labels:
[6]:
# Define gene sets of interest
gene_sets = [
{
"name": "MHC class II",
"genes": ["H2-Ab1", "H2-Aa", "Cd74", "H2-Eb1"],
"color": "#E76F51",
},
{
"name": "Antioxidant",
"genes": ["S100a8", "Alox5ap", "Hp", "S100a9", "Mgst1", "Apoe"],
"color": "#2A9D8F",
},
]
kompot.plot.volcano_de(
adata,
significance_threshold={"local_fdr": 0.02, "mahalanobis": 30},
gene_labels=["S100a8", "Alox5ap", "Hp", "S100a9", "Mgst1", "Apoe"],
highlight_genes=gene_sets,
)
[2025-10-03 13:26:09,313] [INFO ] Found DE run info for run_id=-1
[2025-10-03 13:26:09,314] [INFO ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:26:09,314] [INFO ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:09,315] [INFO ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:09,315] [INFO ] Using DE run 0: comparing Young to Old
[2025-10-03 13:26:09,348] [INFO ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis'
[2025-10-03 13:26:09,350] [INFO ] Added highlight group 'MHC class II' with 4 genes
[2025-10-03 13:26:09,350] [INFO ] Added highlight group 'Antioxidant' with 6 genes
[2025-10-03 13:26:09,358] [INFO ] Labeling 6 specific genes
[2025-10-03 13:26:09,387] [INFO ] Skipping threshold line drawing for dictionary-format significance_threshold
gene_labels can be:
A list of gene names
A dictionary mapping gene names to custom labels
An integer to auto-select the top N genes
highlight_genes can be:
A list of gene names (all same color)
A list of dictionaries with “name”, “genes”, and “color” keys (as shown)
An integer to highlight the top N genes
Expression Visualization¶
Plotting Individual Genes¶
Imputed expression for each condition is stored in adata.layers:
kompot_de_Young_imputedkompot_de_Old_imputedkompot_de_Young_to_Old_fold_change
You can plot these manually:
gene = "Igkc"
# Manual plotting
sc.pl.embedding(adata, basis="umap", color=gene, layer="logged_counts")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Young_imputed")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Old_imputed")
sc.pl.embedding(adata, basis="umap", color=gene, layer="kompot_de_Young_to_Old_fold_change")
Or use the convenience function:
[7]:
kompot.plot.plot_gene_expression(
adata, gene="Igkc", vmin="p2", vmax="p98", frameon=False
)
[2025-10-03 13:26:09,773] [INFO ] Found DE run info for run_id=-1
[2025-10-03 13:26:09,773] [INFO ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:26:09,774] [INFO ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:09,775] [INFO ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:09,775] [INFO ] Using DE run 0: comparing Young to Old
[2025-10-03 13:26:09,776] [INFO ] Using fields for gene expression plot - lfc_key: 'kompot_de_Young_to_Old_mean_lfc', score_key: 'kompot_de_Young_to_Old_mahalanobis'
[2025-10-03 13:26:09,776] [INFO ] Using layer 'logged_counts' inferred from run information
[2025-10-03 13:26:09,778] [INFO ] Using condition1 imputed layer 'kompot_de_Young_imputed' for 'Young'
[2025-10-03 13:26:09,778] [INFO ] Using condition2 imputed layer 'kompot_de_Old_imputed' for 'Old'
[2025-10-03 13:26:09,778] [INFO ] Using fold_change layer 'kompot_de_Young_to_Old_fold_change' from run_info
Comparing Conditions with Subplots¶
The kompot.plot.embedding wrapper supports filtering to specific groups:
[8]:
kompot.plot.embedding(
adata,
"umap",
color="Igkc",
layer=LAYER_FOR_EXPRESSION,
frameon=False,
mgroups=[{GROUPING_COLUMN: condition} for condition in CONDITIONS],
)
[2025-10-03 13:26:11,413] [INFO ] Selected 2,917 cells out of 8,090 total cells.
[2025-10-03 13:26:11,701] [INFO ] Selected 3,116 cells out of 8,090 total cells.
The mgroups parameter creates multiple subplots, each showing cells filtered by the specified conditions.
Heatmap Customization¶
The heatmap function visualizes average expression per group.
Z-Score Normalization¶
By default, values are z-scored across conditions for each gene. Disable this to show raw expression:
[9]:
kompot.plot.heatmap(
adata,
n_top_genes=20,
groupby=CELL_TYPE_COLUMN,
exclude_groups="Plasma cell",
standard_scale=None, # Disable z-scoring
vmax="p99",
)
[2025-10-03 13:26:12,648] [INFO ] Found DE run info for run_id=-1
[2025-10-03 13:26:12,654] [INFO ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:12,654] [INFO ] Successfully inferred fields: {'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:12,655] [INFO ] Using DE run 0 for heatmap.
[2025-10-03 13:26:12,655] [INFO ] Inferred score_key='kompot_de_Young_to_Old_mahalanobis' from run information
[2025-10-03 13:26:12,658] [INFO ] Inferred condition_column='Age' from run information
[2025-10-03 13:26:12,659] [INFO ] Inferred condition1='Young' from run information
[2025-10-03 13:26:12,659] [INFO ] Inferred condition2='Old' from run information
[2025-10-03 13:26:12,660] [INFO ] Inferred layer='logged_counts' from run information
[2025-10-03 13:26:12,660] [INFO ] Creating split heatmap with 20 genes/features
[2025-10-03 13:26:12,661] [INFO ] Using expression data from layer: 'logged_counts'
[2025-10-03 13:26:12,845] [INFO ] Excluded 7 cells from groups: Plasma cell
Custom Gene Lists¶
Instead of selecting top genes by Mahalanobis distance, provide a custom list:
[10]:
custom_genes = ["H2-Q7", "Cd74", "H2-Aa", "H2-Ab1", "S100a9", "S100a8", "Apoe"]
kompot.plot.heatmap(
adata,
genes=custom_genes,
groupby=CELL_TYPE_COLUMN,
exclude_groups="Plasma cell",
vmin="p1",
vmax="p99",
)
[2025-10-03 13:26:15,305] [INFO ] Inferred condition_column='Age' from run information
[2025-10-03 13:26:15,306] [INFO ] Inferred condition1='Young' from run information
[2025-10-03 13:26:15,307] [INFO ] Inferred condition2='Old' from run information
[2025-10-03 13:26:15,307] [INFO ] Inferred layer='logged_counts' from run information
[2025-10-03 13:26:15,308] [INFO ] Creating split heatmap with 7 genes/features
[2025-10-03 13:26:15,308] [INFO ] Using expression data from layer: 'logged_counts'
[2025-10-03 13:26:15,371] [INFO ] Excluded 7 cells from groups: Plasma cell
[2025-10-03 13:26:15,377] [INFO ] Applying gene-wise z-scoring (standard_scale='var')
Simplify Heatmap Display¶
To simplify the plot, pass fold_change_mode=True. Each square will then represent only the difference between the two groups, which equals the log fold change if the expression data was log-transformed.
[11]:
kompot.plot.heatmap(
adata,
genes=custom_genes,
groupby=CELL_TYPE_COLUMN,
exclude_groups="Plasma cell",
vmin="p1",
vmax="p99",
fold_change_mode=True,
)
[2025-10-03 13:26:16,229] [INFO ] Inferred condition_column='Age' from run information
[2025-10-03 13:26:16,232] [INFO ] Inferred condition1='Young' from run information
[2025-10-03 13:26:16,232] [INFO ] Inferred condition2='Old' from run information
[2025-10-03 13:26:16,233] [INFO ] Inferred layer='logged_counts' from run information
[2025-10-03 13:26:16,233] [INFO ] Creating fold change heatmap with 7 genes/features
[2025-10-03 13:26:16,234] [INFO ] Using expression data from layer: 'logged_counts'
[2025-10-03 13:26:16,343] [INFO ] Excluded 7 cells from groups: Plasma cell
[2025-10-03 13:26:16,349] [INFO ] Applying gene-wise z-scoring (standard_scale='var')
[2025-10-03 13:26:16,420] [WARNING ] standard_scale is ignored in fold_change_mode as z-scoring is not appropriate for fold changes
Multiple Comparisons¶
Kompot tracks multiple analysis runs, allowing you to perform different comparisons on the same dataset.
Running Additional Comparisons¶
Let’s add a comparison between Young and Mid-age mice (if Mid exists in your data):
[12]:
de_results_2 = kompot.compute_differential_expression(
adata,
groupby=GROUPING_COLUMN,
condition1="Young",
condition2="Mid",
layer=LAYER_FOR_EXPRESSION,
obsm_key=DIMENSIONALITY_REDUCTION,
batch_size=0,
)
[2025-10-03 13:26:17,562] [WARNING ] Differential expression results with result_key='kompot_de' already exist in the dataset. Previous run was at 2025-10-03T13:26:08.352416 comparing Young to Old. Fields that will be overwritten: layers.kompot_de_Young_imputed, obs.kompot_de_Young_std Set overwrite=False to prevent overwriting or overwrite=True to silence this message.
[2025-10-03 13:26:17,566] [INFO ] Condition 1 (Young): 2,917 cells
[2025-10-03 13:26:17,566] [INFO ] Condition 2 (Mid): 2,057 cells
[2025-10-03 13:26:17,567] [INFO ] Using 8090 of 8090 cells (100.0%)
[2025-10-03 13:26:17,828] [INFO ] Preparing null distribution with null_genes=2000, null_seed=42
[2025-10-03 13:26:19,100] [INFO ] Generated shuffled expression for 2000 null genes
[2025-10-03 13:26:19,578] [INFO ] Fitting expression estimator for condition 1...
[2025-10-03 13:26:21,014] [WARNING ] Gaussin process type is GaussianProcessType.FIXED and n_landmarks=5,000 are requested while only 2,917 datapoints are available. Using all datapoints for 2,917 landmarks instead.
[2025-10-03 13:26:32,556] [INFO ] Fitting expression estimator for condition 2...
[2025-10-03 13:26:46,811] [WARNING ] Gaussin process type is GaussianProcessType.FIXED and n_landmarks=5,000 are requested while only 2,057 datapoints are available. Using all datapoints for 2,057 landmarks instead.
[2025-10-03 13:27:15,798] [INFO ] Using 2,917 landmarks for Mahalanobis computation
[2025-10-03 13:27:27,143] [INFO ] Computing FDR statistics from null distribution
[2025-10-03 13:27:28,213] [INFO ] FDR analysis complete: 0/16285 genes significantly DE at FDR < 0.05
[2025-10-03 13:27:33,336] [INFO ] This run will have `run_id=1`.
Kompot warns about field overwrites and tracks which run created each field.
Accessing Different Runs¶
Each analysis is assigned a run_id (0 for the first run, 1 for the second, etc.). Most plotting functions accept a run_id parameter:
[13]:
# Plot the first comparison (Young vs Old)
kompot.plot.volcano_de(adata, run_id=0, title="Young vs Old")
[2025-10-03 13:27:33,821] [INFO ] Found DE run info for run_id=0
[2025-10-03 13:27:33,823] [INFO ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:27:33,824] [INFO ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:27:33,825] [WARNING ] Field 'kompot_de_Young_to_Old_mean_lfc' was last written by run 0, but current context expects run 1. The field may have been overwritten.
[2025-10-03 13:27:33,826] [WARNING ] Field 'kompot_de_Young_to_Old_mahalanobis' was last written by run 0, but current context expects run 1. The field may have been overwritten.
[2025-10-03 13:27:33,828] [INFO ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:27:33,828] [WARNING ] Field inference completed with 2 warnings
[2025-10-03 13:27:33,829] [INFO ] Using DE run 0: comparing Young to Old
[2025-10-03 13:27:33,851] [INFO ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis'
[2025-10-03 13:27:33,857] [INFO ] Highlighting 139 genes marked as DE (77 up, 62 down)
[2025-10-03 13:27:33,872] [INFO ] Labeling top 10 genes by score
[14]:
# Plot the second comparison (Young vs Mid)
kompot.plot.volcano_de(adata, run_id=1, title="Young vs Mid")
[2025-10-03 13:27:34,235] [INFO ] Found DE run info for run_id=1
[2025-10-03 13:27:34,236] [INFO ] Found mean_lfc_key='kompot_de_Young_to_Mid_mean_lfc' from run info
[2025-10-03 13:27:34,236] [INFO ] Found mahalanobis_key='kompot_de_Young_to_Mid_mahalanobis' from run info
[2025-10-03 13:27:34,237] [INFO ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Mid_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Mid_mahalanobis'}
[2025-10-03 13:27:34,238] [INFO ] Using DE run 1: comparing Young to Mid
[2025-10-03 13:27:34,256] [INFO ] Using data columns from var - lfc: 'kompot_de_Young_to_Mid_mean_lfc', score: 'kompot_de_Young_to_Mid_mahalanobis'
[2025-10-03 13:27:34,258] [INFO ] No genes marked as DE found - falling back to top genes highlighting
[2025-10-03 13:27:34,260] [INFO ] Highlighting top 10 genes by kompot_de_Young_to_Mid_mahalanobis as fallback
[2025-10-03 13:27:34,276] [INFO ] Labeling top 10 genes by score
You can also use negative indexing (run_id=-1 for the most recent run).
Inspecting Run Metadata¶
The RunInfo utility provides detailed information about each analysis. Here we inspect the first run to see that some data has been overwritten by the most recent one:
[16]:
run_info = kompot.RunInfo(adata, run_id=0, analysis_type="de")
run_info
[16]:
Run 0 (DE Analysis)
Run Summary
| Parameter | Value |
|---|---|
| conditions | Young to Old |
| obsm_key | DM_EigenVectors |
| uses_sample_variance | False |
| layer | logged_counts |
| timestamp | 2025-10-03T21:11:48.731385 |
| Fields Created | 9 |
All Parameters
| Parameter | Value |
|---|---|
| auto_filtered | False |
| batch_size | 0 |
| compute_mahalanobis | True |
| condition1 | Young |
| condition2 | Old |
| copy | False |
| eps | 1e-08 |
| fdr_threshold | 0.05 |
| groupby | Age |
| inplace | True |
| jit_compile | False |
| landmarks | False |
| layer | logged_counts |
| ls_factor | 10.0 |
| max_memory_ratio | 0.8 |
| min_cells | 2 |
| n_landmarks | 5000 |
| null_genes | 4000 |
| null_seed | 42 |
| obsm_key | DM_EigenVectors |
| result_key | kompot_de |
| sigma | 1 |
| store_landmarks | False |
| store_posterior_covariance | False |
| use_sample_variance | False |
| used_landmarks | False |
Environment
| Parameter | Value |
|---|---|
| hostname | gizmok39 |
| pid | 11250 |
| platform | Linux-4.15.0-213-generic-x86_64-with-glibc2.27 |
| python_version | 3.12.10 |
| timestamp | 2025-10-03T21:11:48.731574 |
| username | dotto |
Fields Created by This Run
| Field Name | Location | Description | Status |
|---|---|---|---|
| LAYERS Fields | |||
| kompot_de_Old_imputed | layers | [imputed] Imputed expression for Old | Present |
| kompot_de_Young_imputed | layers | [imputed] Imputed expression for Young | Overwritten by Run 1 |
| kompot_de_Young_to_Old_fold_change | layers | [fold_change] Log fold change for each cell and gene | Present |
| OBS Fields | |||
| kompot_de_Old_std | obs | [std] Posterior standard deviation of imputed expression for Old (same for all genes) | Present |
| kompot_de_Young_std | obs | [std] Posterior standard deviation of imputed expression for Young (same for all genes) | Overwritten by Run 1 |
| VAR Fields | |||
| kompot_de_Young_to_Old_is_de | var | [is_de] Boolean indicator of differential expression at local FDR < 0.05 | Present |
| kompot_de_Young_to_Old_mahalanobis | var | [mahalanobis] Mahalanobis distances | Present |
| kompot_de_Young_to_Old_mahalanobis_local_fdr | var | [mahalanobis_local_fdr] Local FDR values using empirical null estimation similar to R's fdrtool | Present |
| kompot_de_Young_to_Old_mean_lfc | var | [mean_log_fold_change] Mean log fold change values | Present |
Comparing Runs¶
Compare parameters between two runs:
[17]:
kompot.RunInfo(adata, run_id=0, analysis_type="de").compare_with(1)
[17]:
Comparison of Run 0 and Run 1
Summary
| Aspect | Run Details | |
|---|---|---|
| Run 0 | Run 1 | |
| conditions | Young to Old | Young to Mid |
| result_key | kompot_de | kompot_de |
| uses_sample_variance | False | False |
| timestamp | 2025-10-03T21:11:48.731385 | 2025-10-03T21:12:39.156717 |
| Field Count | 9 | 9 |
Parameter Differences
Key Parameter Differences
| Parameter | Run 0 | Run 1 |
|---|---|---|
| condition2 | Old | Mid |
All Parameter Differences
| Parameter | Run 0 | Run 1 |
|---|---|---|
| Different Parameters | ||
| null_genes | 4000 | 2000 |
| 35 parameters are the same in both runs | ||
Field Differences
| Field Name | Location | Status | Last Modified By |
|---|---|---|---|
| LAYERS Shared Fields | |||
| kompot_de_Young_imputed | layers | Current value from Run 1 | Run 1 |
| OBS Shared Fields | |||
| kompot_de_Young_std | obs | Current value from Run 1 | Run 1 |
| LAYERS Different Fields | |||
| kompot_de_Old_imputed | layers | Only in Run 0 | Run 0 |
| kompot_de_Young_to_Old_fold_change | layers | Only in Run 0 | Run 0 |
| kompot_de_Mid_imputed | layers | Only in Run 1 | Run 1 |
| kompot_de_Young_to_Mid_fold_change | layers | Only in Run 1 | Run 1 |
| OBS Different Fields | |||
| kompot_de_Old_std | obs | Only in Run 0 | Run 0 |
| kompot_de_Mid_std | obs | Only in Run 1 | Run 1 |
| VAR Different Fields | |||
| kompot_de_Young_to_Old_is_de | var | Only in Run 0 | Run 0 |
| kompot_de_Young_to_Old_mahalanobis | var | Only in Run 0 | Run 0 |
| kompot_de_Young_to_Old_mahalanobis_local_fdr | var | Only in Run 0 | Run 0 |
| kompot_de_Young_to_Old_mean_lfc | var | Only in Run 0 | Run 0 |
| kompot_de_Young_to_Mid_is_de | var | Only in Run 1 | Run 1 |
| kompot_de_Young_to_Mid_mahalanobis | var | Only in Run 1 | Run 1 |
| kompot_de_Young_to_Mid_mahalanobis_local_fdr | var | Only in Run 1 | Run 1 |
| kompot_de_Young_to_Mid_mean_lfc | var | Only in Run 1 | Run 1 |
Resource Planning¶
For large datasets, use dry_run_differential_expression to estimate resource requirements before running the full analysis:
[24]:
plan = kompot.dry_run_differential_expression(
adata,
groupby=GROUPING_COLUMN,
condition1="Young",
condition2="Old",
layer=LAYER_FOR_EXPRESSION,
obsm_key=DIMENSIONALITY_REDUCTION,
null_genes=2000,
verbose=True,
)
================================================================================
RESOURCE USAGE PLAN
================================================================================
System Resources:
Memory: 382.86 GB available (of 754.59 GB total)
Disk: 37.83 GB available at /tmp
Total Requirements:
Memory: 4.19 GB (1% of available)
Memory Allocations:
• Mellon precision matrix L (condition 1, 2,917/3,116 cells) (2917, 2917): 64.92 MB
• Mellon precision matrix L (condition 2, 2,917/3,116 cells) (3116, 3116): 74.08 MB
• Imputed expression (condition 1) (8090, 18285): 1.10 GB → adata.layers['kompot_de_Young_imputed']
• Imputed expression (condition 2) (8090, 18285): 1.10 GB → adata.layers['kompot_de_Old_imputed']
• Fold change (8090, 18285): 1.10 GB → adata.layers['kompot_de_Young_to_Old_fold_change']
• Function predictor covariances (per condition) (5000, 5000): 381.47 MB
• Combined covariance matrix (5000, 5000): 190.73 MB
• Cholesky decomposition (for Mahalanobis) (5000, 5000): 190.73 MB
• Mahalanobis batch processing (batch_size=100) (100, 5000): 3.81 MB
Output Fields:
adata.layers:
- kompot_de_Young_imputed
- kompot_de_Old_imputed
- kompot_de_Young_to_Old_fold_change
adata.var:
- kompot_de_Young_to_Old_mahalanobis [OVERWRITES run_id=0]
- kompot_de_Young_to_Old_mean_lfc [OVERWRITES run_id=0]
- kompot_de_Young_to_Old_mahalanobis_local_fdr
- kompot_de_Young_to_Old_is_de
Info:
ℹ Null distribution will use 2000 additional genes (total: 18285 genes processed)
ℹ Mahalanobis computation processes 100 genes per batch. Reduce batch_size to lower peak memory (currently 3.81 MB for batch arrays).
Warnings:
⚠ Results with result_key='kompot_de' already exist (run_id=1). Previous run: 2025-10-03T14:50:36.145559 comparing Young to Mid (null_genes=2000). Fields that will be overwritten: var.kompot_de_Young_to_Old_mahalanobis, var.kompot_de_Young_to_Old_mean_lfc, obs.kompot_de_Young_std, obs.kompot_de_Old_std
================================================================================
STATUS: ⚠ FEASIBLE WITH WARNINGS - Proceed with caution
================================================================================
The report shows:
Available system memory and disk space
Estimated memory requirements for each computation step
Which fields will be created or overwritten (with
run_id)Whether the analysis is feasible
This is especially useful for testing different parameter combinations (e.g., with/without sample variance) before committing to a long computation.
Saving Results¶
Selective Cleanup¶
Remove imputed expression layers while preserving statistical results:
[18]:
# This keeps adata.var statistics but removes large adata.layers
kompot.cleanup(adata)
[2025-10-03 13:27:34,746] [INFO ] Cleaning up all 2 run(s)
[2025-10-03 13:27:34,760] [INFO ] Cleaned up 3 field(s) from run 0:
[2025-10-03 13:27:34,761] [INFO ] layers (3 field(s)):
[2025-10-03 13:27:34,761] [INFO ] - kompot_de_Young_imputed
[2025-10-03 13:27:34,762] [INFO ] - kompot_de_Old_imputed
[2025-10-03 13:27:34,762] [INFO ] - kompot_de_Young_to_Old_fold_change
[2025-10-03 13:27:34,770] [INFO ] Cleaned up 2 field(s) from run 1:
[2025-10-03 13:27:34,771] [INFO ] layers (2 field(s)):
[2025-10-03 13:27:34,771] [INFO ] - kompot_de_Mid_imputed
[2025-10-03 13:27:34,772] [INFO ] - kompot_de_Young_to_Mid_fold_change
[2025-10-03 13:27:34,772] [INFO ] Total: Cleaned up 5 field(s) across 2 run(s)
To clean up specific runs only:
[20]:
# Clean up only the first run
kompot.cleanup(adata, run_ids=0, analysis_type="de")
[2025-10-03 13:28:06,698] [INFO ] No fields deleted.
[21]:
adata.write_h5ad("../data/murine_bone_marrow_aging_processed.h5ad")
Summary¶
This tutorial covered:
null_genes, sigma, batch_size)run_idNext Steps¶
Tutorial 3: Sample Variance Analysis - Account for biological replicates
API Documentation: kompot.readthedocs.io