Getting Started with Kompot¶
This tutorial introduces Kompot’s core functionality for differential analysis in single-cell data. You’ll learn how to:
Perform differential abundance (DA) analysis to identify cell states that change between conditions
Conduct differential expression (DE) analysis to find genes with altered expression
Visualize and interpret results using Kompot’s plotting tools
What Makes Kompot Different?¶
Kompot uses Mahalanobis distance to detect expression differences while accounting for the covariance structure of gene expression. This approach is particularly powerful for:
Detecting subtle changes along continuous cell state trajectories
Identifying genes with complex, coordinated expression patterns
Analyzing data where discrete cell type labels are inadequate
Dataset¶
We’ll analyze murine bone marrow cells comparing Young vs. Old mice to understand how aging affects hematopoietic stem cells and their derivatives.
[1]:
import anndata as ad
import matplotlib.pyplot as plt
import numpy as np
import palantir
import pandas as pd
import scanpy as sc
import seaborn as sns
import kompot
# Set plotting style
plt.rcParams["axes.spines.right"] = False
plt.rcParams["axes.spines.top"] = False
plt.rcParams["image.cmap"] = "Spectral_r"
Configuration¶
Define analysis parameters. Adapt these to your own data:
[2]:
DATA_PATH = "../data/murine_bone_marrow_aging.h5ad"
GROUPING_COLUMN = "Age" # Condition column in adata.obs
CONDITIONS = ["Young", "Old"] # First condition is reference
CELL_TYPE_COLUMN = "highres_celltype" # Optional: for visualization only
DIMENSIONALITY_REDUCTION = "DM_EigenVectors" # Cell state representation
LAYER_FOR_EXPRESSION = "logged_counts" # Expression data layer
Load Data¶
The dataset will be downloaded automatically from Zenodo if not already present:
[3]:
import os
from pathlib import Path
import requests
from tqdm.auto import tqdm
Path(DATA_PATH).parent.mkdir(parents=True, exist_ok=True)
if not os.path.exists(DATA_PATH):
print("Downloading dataset...")
url = "https://zenodo.org/records/15587768/files/murine_bone_marrow_aging.h5ad?download=1"
response = requests.get(url, stream=True)
total = int(response.headers.get("content-length", 0))
with open(DATA_PATH, "wb") as file, tqdm(total=total, unit="B", unit_scale=True) as bar:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
bar.update(len(chunk))
adata = ad.read_h5ad(DATA_PATH)
adata
[3]:
AnnData object with n_obs × n_vars = 8090 × 16285
obs: 'Compartment', 'Replicate', 'Age', 'Sample', 'Info', 'batch', 'doublet_score', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_hb', 'pct_counts_hb', 'S_score', 'G2M_score', 'phase', 'leiden', 'phenograph', 'highres_celltype', 'midres_celltype'
var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'hb', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
uns: 'Age_colors', 'Compartment_colors', 'DMEigenValues', 'Info_colors', 'README', 'Replicate_colors', 'Sample_colors', 'batch_colors', 'draw_graph', 'highres_celltype_colors', 'hvg', 'leiden', 'leiden_colors', 'midres_celltype_colors', 'neighbors', 'pca', 'phase_colors', 'umap'
obsm: 'AbCapture', 'DM_EigenVectors', 'HTO', 'X_draw_graph_fa', 'X_pca', 'X_pca_harmony', 'X_pca_noregression', 'X_umap'
varm: 'PCs'
layers: 'MAGIC_imputed_data', 'logged_counts', 'normalized_counts', 'raw_counts'
obsp: 'DM_Kernel', 'connectivities', 'distances'
Data Exploration¶
Before analysis, examine the data structure and distribution:
[4]:
sc.pl.umap(adata, color=[CELL_TYPE_COLUMN, GROUPING_COLUMN], frameon=False, wspace=1.3)
If you have a clustering or cell-type annotation, cell type composition can provide a rough understanding of the data and help identify underrepresented cell types that could complicate differential expression analysis.
[5]:
crosstab = (
pd.crosstab(
adata.obs[CELL_TYPE_COLUMN], adata.obs[GROUPING_COLUMN], normalize="index"
)
* 100
)
# Plot the distribution
ax = crosstab.plot(kind="bar", stacked=False, figsize=(12, 8))
ax.grid(False)
plt.xlabel("Cell Type")
plt.ylabel("Percentage (%)")
plt.title("Cell Type Distribution by Condition")
plt.legend(title=GROUPING_COLUMN)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
# Print extremes (cell types with high bias toward one condition)
bias_threshold = 75 # Percentage threshold for considering a cell type biased
biased_types = crosstab[(crosstab > bias_threshold).any(axis=1)]
if not biased_types.empty:
print(f"Cell types with >={bias_threshold}% bias toward one condition:")
print(biased_types)
print("\nThese cell types might show disproportionate changes between conditions.")
Cell types with >=75% bias toward one condition:
Age Mid Old Young
highres_celltype
HSC 22.257053 75.548589 2.194357
Naive CD8 T cell 8.000000 1.000000 91.000000
These cell types might show disproportionate changes between conditions.
Diffusion Maps Preprocessing¶
Kompot requires a continuous representation of cell states. Palantir diffusion maps capture the geometry of differentiation trajectories while reducing noise:
[6]:
palantir.utils.run_diffusion_maps(adata, pca_key="X_pca_harmony", n_components=40);
Differential Abundance Analysis¶
Identify cell states that change in frequency between conditions.
See kompot.compute_differential_abundance for full documentation.
[7]:
da_results = kompot.compute_differential_abundance(
adata,
groupby=GROUPING_COLUMN,
condition1=CONDITIONS[0],
condition2=CONDITIONS[1],
obsm_key=DIMENSIONALITY_REDUCTION,
)
[2025-10-03 13:21:26,447] [INFO ] Condition 1 (Young): 2,917 cells
[2025-10-03 13:21:26,448] [INFO ] Condition 2 (Old): 3,116 cells
[2025-10-03 13:21:26,451] [INFO ] Fitting density estimator for condition 1...
WARNING:2025-10-03 13:21:26,507:jax._src.xla_bridge:966: An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
[2025-10-03 13:22:13,343] [INFO ] Fitting density estimator for condition 2...
[2025-10-03 13:22:30,146] [WARNING ] The normalization is only effective if the density was trained with d_method="fractal".
[2025-10-03 13:22:31,295] [WARNING ] The normalization is only effective if the density was trained with d_method="fractal".
[2025-10-03 13:22:42,501] [INFO ] This run will have `run_id=0`.
Visualize Abundance Changes¶
Results are stored in adata.obs:
kompot_da_Young_to_Old_lfc: Log fold change (positive = enriched in Old)kompot_da_Young_to_Old_lfc_direction: Categorical classification (up/down/unchanged)
[8]:
sc.pl.embedding(
adata, "umap",
color=[f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc_direction", f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc"],
color_map="RdBu_r",
vcenter=0,
frameon=False,
)
Volcano Plot¶
Volcano plots for differential abundance show effect size (x-axis) vs. statistical significance (y-axis):
[9]:
kompot.plot.volcano_da(adata, color=CELL_TYPE_COLUMN)
[2025-10-03 13:22:43,131] [INFO ] Found DA run info for run_id=-1
[2025-10-03 13:22:43,132] [INFO ] Found lfc_key='kompot_da_Young_to_Old_lfc' from run info
[2025-10-03 13:22:43,132] [INFO ] Found ptp_key='kompot_da_Young_to_Old_neg_log10_lfc_ptp' from run info
[2025-10-03 13:22:43,133] [INFO ] Successfully inferred fields: {'lfc_key': 'kompot_da_Young_to_Old_lfc', 'ptp_key': 'kompot_da_Young_to_Old_neg_log10_lfc_ptp'}
[2025-10-03 13:22:43,134] [INFO ] Using inferred thresholds - lfc_threshold: 1.0, ptp_threshold: 0.05
Points above the horizontal line and outside vertical lines are significantly changed.
Adjust thresholds using ptp_threshold and lfc_threshold parameters. Use update_direction=True to update the classification in adata.obs.
[10]:
# Visualize only significantly changed cells
kompot.plot.embedding(
adata, "umap",
color=[CELL_TYPE_COLUMN, f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc"],
groups={f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc_direction": ["up", "down"]},
frameon=False,
wspace=.5,
)
[2025-11-18 14:50:16,345] [INFO ] Selected 1,090 cells out of 8,090 total cells.
The mgroups parameter in the kompot.plot.embedding function allows plotting increasing and decreasing cell types separately:
[11]:
kompot.plot.embedding(
adata,
"umap",
color=CELL_TYPE_COLUMN,
mgroups=[{f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc_direction":[d]} for d in ["up", "down"]],
frameon=False,
wspace=.5,
)
[2025-11-18 14:50:17,070] [INFO ] Selected 595 cells out of 8,090 total cells.
[2025-11-18 14:50:17,134] [INFO ] Selected 495 cells out of 8,090 total cells.
Summary by Cell Type¶
Aggregate abundance changes by cell type with the direction_barplot:
[12]:
kompot.plot.direction_barplot(adata, category_column=CELL_TYPE_COLUMN)
[2025-10-03 13:22:45,749] [INFO ] Found DA run info for run_id=-1
[2025-10-03 13:22:45,749] [INFO ] Found direction_key='kompot_da_Young_to_Old_lfc_direction' from run info
[2025-10-03 13:22:45,750] [INFO ] Successfully inferred fields: {'direction_key': 'kompot_da_Young_to_Old_lfc_direction'}
[2025-10-03 13:22:45,750] [INFO ] Using DA run 0 for direction_barplot: comparing Young to Old
[2025-10-03 13:22:45,750] [INFO ] Creating direction barplot: comparing Young to Old
[2025-10-03 13:22:45,751] [INFO ] Using fields - category_column: 'highres_celltype', direction_column: 'kompot_da_Young_to_Old_lfc_direction'
Differential Expression Analysis¶
Identify genes that change in expression between conditions.
See kompot.compute_differential_expression for full documentation.
[13]:
de_results = kompot.compute_differential_expression(
adata,
groupby=GROUPING_COLUMN,
condition1=CONDITIONS[0],
condition2=CONDITIONS[1],
layer=LAYER_FOR_EXPRESSION,
obsm_key=DIMENSIONALITY_REDUCTION,
batch_size=0 # set to, e.g., 100 to batch cells and genes for lower memory demand
)
[2025-10-03 13:22:46,599] [INFO ] Condition 1 (Young): 2,917 cells
[2025-10-03 13:22:46,599] [INFO ] Condition 2 (Old): 3,116 cells
[2025-10-03 13:22:46,600] [INFO ] Using 8090 of 8090 cells (100.0%)
[2025-10-03 13:22:46,940] [INFO ] Preparing null distribution with null_genes=2000, null_seed=42
[2025-10-03 13:22:48,822] [INFO ] Generated shuffled expression for 2000 null genes
[2025-10-03 13:23:16,888] [INFO ] Fitting expression estimator for condition 1...
[2025-10-03 13:23:46,013] [INFO ] Fitting expression estimator for condition 2...
[2025-10-03 13:24:38,148] [INFO ] Landmark storage skipped (store_landmarks=False). Compute with store_landmarks=True to enable landmark reuse.
[2025-10-03 13:25:48,824] [INFO ] Using 5,000 landmarks for Mahalanobis computation
[2025-10-03 13:26:26,665] [INFO ] Computing FDR statistics from null distribution
[2025-10-03 13:26:30,065] [INFO ] FDR analysis complete: 393/16285 genes significantly DE at FDR < 0.05
[2025-10-03 13:26:30,066] [INFO ] Mahalanobis distance threshold for FDR < 0.05: 23.1080
[2025-10-03 13:26:39,162] [INFO ] This run will have `run_id=0`.
Resource Planning¶
For larger datasets or production workflows, you may want to optimize memory usage and computational resources. See the Resource Planning section in Tutorial 2 for more detailed guidance.
Results Interpretation¶
Results are stored in adata.var:
kompot_de_Young_to_Old_mean_lfc: Average log fold changekompot_de_Young_to_Old_mahalanobis: Statistical significance (higher = more significant)kompot_de_Young_to_Old_is_de: Boolean flag for significant genes (FDR < 0.05)
The Mahalanobis distance accounts for covariance structure and is more sensitive than simple fold change.
[14]:
# Top differentially expressed genes
adata.var.loc[
:, adata.var.columns.str.contains("kompot_de")
].sort_values(
f"kompot_de_{CONDITIONS[0]}_to_{CONDITIONS[1]}_mahalanobis", ascending=False
).head(20)
[14]:
| kompot_de_Young_to_Old_mahalanobis | kompot_de_Young_to_Old_mean_lfc | kompot_de_Young_to_Old_mahalanobis_local_fdr | kompot_de_Young_to_Old_is_de | |
|---|---|---|---|---|
| H2-Q7 | 71.517322 | 1.235066 | 0.014441 | True |
| Cd74 | 60.021451 | 0.352121 | 0.014441 | True |
| H2-Aa | 59.371135 | 0.473756 | 0.014441 | True |
| H2-Ab1 | 57.808982 | 0.511123 | 0.014441 | True |
| Igkc | 53.604111 | 0.054009 | 0.014441 | True |
| H2-Eb1 | 53.403829 | 0.411752 | 0.014441 | True |
| AW112010 | 53.386234 | 0.791367 | 0.014441 | True |
| S100a9 | 48.765184 | -0.170317 | 0.014441 | True |
| Ifitm3 | 47.585812 | 0.367545 | 0.014441 | True |
| S100a8 | 47.372815 | -0.165121 | 0.014441 | True |
| H2-Q6 | 45.991623 | 0.782279 | 0.014441 | True |
| Cd52 | 45.309906 | -0.127807 | 0.014441 | True |
| Aldh1a1 | 44.596577 | 0.490811 | 0.014441 | True |
| Ifitm1 | 44.320577 | 0.427663 | 0.014441 | True |
| Ifitm2 | 43.940652 | 0.285836 | 0.014441 | True |
| Ighm | 43.216410 | -0.173726 | 0.014441 | True |
| Cd79a | 43.107791 | -0.004329 | 0.014441 | True |
| Apoe | 42.935707 | -0.131211 | 0.014441 | True |
| Fos | 42.282846 | 0.531571 | 0.014441 | True |
| Gm47283 | 41.956500 | 0.627173 | 0.014441 | True |
Volcano Plot¶
Visualize effect size vs. significance for differential expression with the volcano_de plot:
[15]:
kompot.plot.volcano_de(adata)
[2025-10-03 13:26:40,415] [INFO ] Found DE run info for run_id=-1
[2025-10-03 13:26:40,416] [INFO ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:26:40,417] [INFO ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:40,417] [INFO ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:40,418] [INFO ] Using DE run 0: comparing Young to Old
[2025-10-03 13:26:40,431] [INFO ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis'
[2025-10-03 13:26:40,457] [INFO ] Highlighting 393 genes marked as DE (232 up, 161 down)
[2025-10-03 13:26:40,477] [INFO ] Labeling top 10 genes by score
Fold Change Heatmap¶
Visualize fold changes across top differentially expressed genes with a heatmap. This provides a complementary view to the volcano plot, showing the magnitude and direction of expression changes.
Note that the gene selection is based on Kompot’s Mahalanobis distance (statistical significance), but the fold changes displayed are simply the difference of mean expressions from the input expression layer (in this case logged_counts), not a Kompot-specific metric:
[17]:
# Selecting top 20 genes
genes = adata.var[f"kompot_de_{CONDITIONS[0]}_to_{CONDITIONS[1]}_mahalanobis"].sort_values(ascending=False).head(20).index
kompot.plot.heatmap(
adata,
genes=genes,
groupby=CELL_TYPE_COLUMN, # Aggregate expression by cell type
exclude_groups="Plasma cell", # Remove cell types with too little representation
vmin="p1", # Color scale minimum at 1st percentile (handles outliers)
vmax="p99", # Color scale maximum at 99th percentile
fold_change_mode=True, # Display fold changes instead of mean expression
)
[2025-11-29 18:24:53,333] [INFO ] Inferred condition_column='Age' from run information
[2025-11-29 18:24:53,334] [INFO ] Inferred condition1='Young' from run information
[2025-11-29 18:24:53,335] [INFO ] Inferred condition2='Old' from run information
[2025-11-29 18:24:53,336] [INFO ] Inferred layer='logged_counts' from run information
[2025-11-29 18:24:53,336] [INFO ] Creating fold change heatmap with 20 genes/features
[2025-11-29 18:24:53,338] [INFO ] Using expression data from layer: 'logged_counts'
[2025-11-29 18:24:53,452] [INFO ] Excluded 7 cells from groups: Plasma cell
[2025-11-29 18:24:53,458] [INFO ] Applying gene-wise z-scoring (standard_scale='var')
[2025-11-29 18:24:53,524] [WARNING ] standard_scale is ignored in fold_change_mode as z-scoring is not appropriate for fold changes
Functional Enrichment¶
Use StringDBReport to analyze gene sets using STRING database:
Privacy Note: This sends gene lists to the STRING database API. If working with sensitive data, consider local alternatives.
[16]:
# Selecting top 20 genes
gene = adata.var[f"kompot_de_{CONDITIONS[0]}_to_{CONDITIONS[1]}_mahalanobis"].sort_values(ascending=False).head(20).index
# Create STRING report (10090 = Mus musculus, 9606 = Homo Sapiens)
report = kompot.plot.StringDBReport(
gene,
species_id=10090,
include_enrichment=True
)
report
[16]:
Gene Set Report: 20 genes
Species: Mus musculus (Taxonomy ID: 10090)
StringDB Network
View interactive network in StringDB
Resource Links (20 genes)
Functional Enrichment Analysis
View interactive enrichment analysis on StringDB
Gene Ontology Processes (37 terms)
| term | description | signal | strength | fdr | number_of_genes | inputGenes |
|---|---|---|---|---|---|---|
| GO:0070488 | Neutrophil aggregation | 0.573343 | 2.865301 | 1.800000e-03 | 2 | [S100a8, S100a9] |
| GO:0002503 | Peptide antigen assembly with MHC class II protein complex | 0.517595 | 2.518514 | 6.920000e-05 | 3 | [H2-Ab1, H2-Aa, H2-Eb1] |
| GO:0019886 | Antigen processing and presentation of exogenous peptide antigen via MHC class II | 0.476370 | 2.388180 | 2.000000e-06 | 4 | [H2-Ab1, H2-Aa, Cd74, H2-Eb1] |
| GO:0018119 | Peptidyl-cysteine S-nitrosylation | 0.369195 | 2.497325 | 6.100000e-03 | 2 | [S100a8, S100a9] |
| GO:0035425 | Autocrine signaling | 0.342164 | 2.439333 | 7.100000e-03 | 2 | [S100a8, S100a9] |
| GO:0046597 | Negative regulation of viral entry into host cell | 0.290538 | 2.124939 | 1.160000e-05 | 4 | [Ifitm3, Ifitm2, Cd74, Ifitm1] |
| GO:0060337 | Type I interferon signaling pathway | 0.246167 | 2.087150 | 6.100000e-04 | 3 | [Ifitm3, Ifitm2, Ifitm1] |
| GO:0002579 | Positive regulation of antigen processing and presentation | 0.235977 | 2.196295 | 1.610000e-02 | 2 | [H2-Ab1, Cd74] |
| GO:0048002 | Antigen processing and presentation of peptide antigen | 0.230769 | 1.980695 | 1.480000e-07 | 6 | [H2-Ab1, H2-Aa, H2-Q7, Cd74, H2-Q6, H2-Eb1] |
| GO:0035455 | Response to interferon-alpha | 0.198291 | 1.974446 | 1.100000e-03 | 3 | [Ifitm3, Ifitm2, Ifitm1] |
| GO:0002523 | Leukocyte migration involved in inflammatory response | 0.137714 | 1.895265 | 4.830000e-02 | 2 | [S100a8, S100a9] |
| GO:0002474 | Antigen processing and presentation of peptide antigen via MHC class I | 0.137714 | 1.895265 | 4.830000e-02 | 2 | [H2-Q7, H2-Q6] |
| GO:0034341 | Response to interferon-gamma | 0.130841 | 1.710399 | 1.480000e-07 | 7 | [Ifitm3, H2-Ab1, H2-Aa, Ifitm2, H2-Q7, Cd74, Ifitm1] |
| GO:0045071 | Negative regulation of viral genome replication | 0.125120 | 1.747662 | 4.200000e-03 | 3 | [Ifitm3, Ifitm2, Ifitm1] |
| GO:0035456 | Response to interferon-beta | 0.108594 | 1.679665 | 6.100000e-03 | 3 | [Ifitm3, Ifitm2, Ifitm1] |
| GO:1990748 | Cellular detoxification | 0.096972 | 1.598130 | 6.100000e-04 | 4 | [S100a8, S100a9, Apoe, Aldh1a1] |
| GO:0009636 | Response to toxic substance | 0.059109 | 1.361965 | 3.400000e-04 | 5 | [Fos, S100a8, S100a9, Apoe, Aldh1a1] |
| GO:0050870 | Positive regulation of T cell activation | 0.045165 | 1.252518 | 8.300000e-03 | 4 | [H2-Ab1, H2-Aa, Cd74, H2-Eb1] |
| GO:0002250 | Adaptive immune response | 0.042766 | 1.217918 | 1.300000e-03 | 5 | [Cd79a, H2-Ab1, H2-Aa, Cd74, H2-Eb1] |
| GO:0045087 | Innate immune response | 0.035370 | 1.121734 | 9.980000e-07 | 9 | [Ifitm3, H2-Ab1, H2-Aa, S100a8, Ifitm2, H2-Q7, Cd74, Ifitm1, S100a9] |
Showing 20 of 37 enriched terms
KEGG Pathways (28 terms)
| term | description | signal | strength | fdr | number_of_genes | inputGenes |
|---|---|---|---|---|---|---|
| mmu05332 | Graft-versus-host disease | 0.835879 | 2.041393 | 5.950000e-08 | 5 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| mmu05330 | Allograft rejection | 0.827236 | 2.032793 | 5.950000e-08 | 5 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| mmu04612 | Antigen processing and presentation | 0.780488 | 1.944483 | 5.540000e-09 | 6 | [H2-Ab1, H2-Aa, H2-Q7, Cd74, H2-Q6, H2-Eb1] |
| mmu04940 | Type I diabetes mellitus | 0.771263 | 1.976935 | 5.990000e-08 | 5 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| mmu05310 | Asthma | 0.726704 | 2.138303 | 1.940000e-05 | 3 | [H2-Ab1, H2-Aa, H2-Eb1] |
| mmu05320 | Autoimmune thyroid disease | 0.701568 | 1.914288 | 9.500000e-08 | 5 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| mmu05416 | Viral myocarditis | 0.656329 | 1.871131 | 1.270000e-07 | 5 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| mmu05140 | Leishmaniasis | 0.550054 | 1.823909 | 5.660000e-06 | 4 | [Fos, H2-Ab1, H2-Aa, H2-Eb1] |
| mmu04672 | Intestinal immune network for IgA production | 0.536462 | 1.905730 | 8.170000e-05 | 3 | [H2-Ab1, H2-Aa, H2-Eb1] |
| mmu05323 | Rheumatoid arthritis | 0.466192 | 1.719173 | 1.100000e-05 | 4 | [Fos, H2-Ab1, H2-Aa, H2-Eb1] |
| mmu04658 | Th1 and Th2 cell differentiation | 0.459220 | 1.708954 | 1.120000e-05 | 4 | [Fos, H2-Ab1, H2-Aa, H2-Eb1] |
| mmu05321 | Inflammatory bowel disease | 0.420401 | 1.740363 | 2.200000e-04 | 3 | [H2-Ab1, H2-Aa, H2-Eb1] |
| mmu04659 | Th17 cell differentiation | 0.407465 | 1.639131 | 1.940000e-05 | 4 | [Fos, H2-Ab1, H2-Aa, H2-Eb1] |
| mmu04514 | Cell adhesion molecules | 0.367600 | 1.544463 | 3.570000e-06 | 5 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| mmu04662 | B cell receptor signaling pathway | 0.363811 | 1.649282 | 3.800000e-04 | 3 | [Cd79a, Fos, Ifitm1] |
| mmu04145 | Phagosome | 0.359086 | 1.530848 | 3.690000e-06 | 5 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| mmu05166 | Human T-cell leukemia virus 1 infection | 0.325142 | 1.454056 | 5.430000e-07 | 6 | [Fos, H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| mmu04657 | IL-17 signaling pathway | 0.318638 | 1.569124 | 6.100000e-04 | 3 | [Fos, S100a8, S100a9] |
| mmu04640 | Hematopoietic cell lineage | 0.316537 | 1.564271 | 6.100000e-04 | 3 | [H2-Ab1, H2-Aa, H2-Eb1] |
| mmu05322 | Systemic lupus erythematosus | 0.309591 | 1.550031 | 6.400000e-04 | 3 | [H2-Ab1, H2-Aa, H2-Eb1] |
Showing 20 of 28 enriched terms
Gene Ontology Functions (8 terms)
| term | description | signal | strength | fdr | number_of_genes | inputGenes |
|---|---|---|---|---|---|---|
| GO:0023026 | MHC class II protein complex binding | 0.949472 | 2.564271 | 8.500000e-07 | 4 | [H2-Ab1, H2-Aa, Cd74, H2-Eb1] |
| GO:0042605 | Peptide antigen binding | 0.652174 | 2.249001 | 1.920000e-07 | 5 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] |
| GO:0042608 | T cell receptor binding | 0.346516 | 2.027152 | 1.300000e-03 | 3 | [H2-Q7, H2-Q6, H2-Eb1] |
| GO:0042609 | CD4 receptor binding | 0.335093 | 2.263241 | 2.030000e-02 | 2 | [Cd74, H2-Eb1] |
| GO:0042287 | MHC protein binding | 0.231485 | 1.802511 | 4.500000e-03 | 3 | [H2-Q7, Cd74, H2-Q6] |
| GO:0016209 | Antioxidant activity | 0.152916 | 1.594235 | 1.610000e-02 | 3 | [S100a8, S100a9, Apoe] |
| GO:0042277 | Peptide binding | 0.114579 | 1.353736 | 3.030000e-06 | 7 | [H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1] |
| GO:0044877 | Protein-containing complex binding | 0.028761 | 0.739820 | 3.900000e-03 | 8 | [Fos, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1] |
Gene Ontology Components (25 terms)
| term | description | signal | strength | fdr | number_of_genes | inputGenes |
|---|---|---|---|---|---|---|
| GO:0042613 | MHC class II protein complex | 0.847811 | 2.602060 | 1.240000e-07 | 4 | [H2-Ab1, H2-Aa, Cd74, H2-Eb1] |
| GO:0042611 | MHC protein complex | 0.653465 | 2.288065 | 4.110000e-10 | 6 | [H2-Ab1, H2-Aa, H2-Q7, Cd74, H2-Q6, H2-Eb1] |
| GO:0071556 | Integral component of lumenal side of endoplasmic reticulum membrane | 0.280364 | 2.138303 | 6.000000e-03 | 2 | [H2-Q7, H2-Q6] |
| GO:0032398 | MHC class Ib protein complex | 0.230887 | 2.000000 | 9.700000e-03 | 2 | [H2-Q7, H2-Q6] |
| GO:0042612 | MHC class I protein complex | 0.219137 | 1.962211 | 1.070000e-02 | 2 | [H2-Q7, H2-Q6] |
| GO:0033106 | cis-Golgi network membrane | 0.193032 | 1.880025 | 1.430000e-02 | 2 | [H2-Q7, H2-Q6] |
| GO:0005771 | Multivesicular body | 0.162256 | 1.655191 | 2.000000e-03 | 3 | [Cd79a, H2-Ab1, Cd74] |
| GO:0005770 | Late endosome | 0.119764 | 1.418143 | 2.640000e-07 | 7 | [Cd79a, Ifitm3, H2-Ab1, Ifitm2, Cd74, Apoe, H2-Eb1] |
| GO:0030670 | Phagocytic vesicle membrane | 0.115688 | 1.586548 | 4.410000e-02 | 2 | [H2-Q7, H2-Q6] |
| GO:0005765 | Lysosomal membrane | 0.107614 | 1.377064 | 5.980000e-06 | 6 | [Ifitm3, Ifitm2, H2-Q7, Ifitm1, H2-Q6, H2-Eb1] |
| GO:0031902 | Late endosome membrane | 0.088021 | 1.348252 | 1.090000e-02 | 3 | [Ifitm3, Ifitm2, H2-Eb1] |
| GO:0005764 | Lysosome | 0.082485 | 1.241287 | 5.820000e-08 | 9 | [Ifitm3, H2-Aa, Ifitm2, H2-Q7, Cd74, Ifitm1, Apoe, H2-Q6, H2-Eb1] |
| GO:0009897 | External side of plasma membrane | 0.080203 | 1.230992 | 2.930000e-07 | 8 | [Cd79a, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1] |
| GO:0031901 | Early endosome membrane | 0.073535 | 1.263241 | 1.800000e-02 | 3 | [Ifitm3, H2-Q7, H2-Q6] |
| GO:0005769 | Early endosome | 0.071391 | 1.190134 | 5.330000e-05 | 6 | [Ifitm3, H2-Ab1, H2-Q7, Ifitm1, Apoe, H2-Q6] |
| GO:0098797 | Plasma membrane protein complex | 0.061461 | 1.116375 | 1.480000e-05 | 7 | [Cd79a, H2-Ab1, H2-Aa, H2-Q7, Cd74, H2-Q6, H2-Eb1] |
| GO:0010008 | Endosome membrane | 0.054324 | 1.076662 | 1.400000e-03 | 5 | [Ifitm3, Ifitm2, H2-Q7, H2-Q6, H2-Eb1] |
| GO:0005768 | Endosome | 0.053483 | 1.045319 | 1.240000e-07 | 10 | [Cd79a, Ifitm3, H2-Ab1, Ifitm2, H2-Q7, Cd74, Ifitm1, Apoe, H2-Q6, H2-Eb1] |
| GO:0009986 | Cell surface | 0.044503 | 0.966251 | 3.210000e-06 | 9 | [Cd79a, Ifitm3, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1] |
| GO:0012505 | Endomembrane system | 0.013531 | 0.443299 | 2.800000e-03 | 11 | [Cd79a, Fos, Ifitm3, H2-Ab1, Ifitm2, H2-Q7, Cd74, Ifitm1, Apoe, H2-Q6, H2-Eb1] |
Showing 20 of 25 enriched terms
Reactome Pathways (7 terms)
| term | description | signal | strength | fdr | number_of_genes | inputGenes |
|---|---|---|---|---|---|---|
| MMU-6799990 | Metal sequestration by antimicrobial proteins | 0.728901 | 2.740363 | 0.003200 | 2 | [S100a8, S100a9] |
| MMU-5686938 | Regulation of TLR by endogenous ligand | 0.247582 | 2.041393 | 0.038600 | 2 | [S100a8, S100a9] |
| MMU-5668599 | RHO GTPases Activate NADPH Oxidases | 0.217832 | 1.962211 | 0.042900 | 2 | [S100a8, S100a9] |
| MMU-198933 | Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell | 0.142132 | 1.634853 | 0.000510 | 4 | [Ifitm3, Ifitm2, H2-Q7, Ifitm1] |
| MMU-168898 | Toll-like Receptor Cascades | 0.074034 | 1.363178 | 0.042900 | 3 | [Fos, S100a8, S100a9] |
| MMU-1280218 | Adaptive Immune System | 0.039783 | 1.049534 | 0.000240 | 7 | [Cd79a, Ifitm3, Ifitm2, H2-Q7, Cd74, Ifitm1, H2-Eb1] |
| MMU-168256 | Immune System | 0.024465 | 0.833220 | 0.000045 | 10 | [Cd79a, Fos, Ifitm3, S100a8, Ifitm2, H2-Q7, Cd74, Ifitm1, S100a9, H2-Eb1] |
[17]:
# Get enriched functional categories
report.get_functional_enrichment("Function").head(10)
[17]:
| category | term | number_of_genes | number_of_genes_in_background | ncbiTaxonId | inputGenes | preferredNames | p_value | fdr | description | expected | strength | signal | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 98 | Function | GO:0023026 | 4 | 12 | 10090 | [H2-Ab1, H2-Aa, Cd74, H2-Eb1] | [H2-Ab1, H2-Aa, Cd74, H2-Eb1] | 3.470000e-10 | 8.500000e-07 | MHC class II protein complex binding | 0.010909 | 2.564271 | 0.949472 |
| 97 | Function | GO:0042605 | 5 | 31 | 10090 | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] | [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] | 3.910000e-11 | 1.920000e-07 | Peptide antigen binding | 0.028182 | 2.249001 | 0.652174 |
| 100 | Function | GO:0042608 | 3 | 31 | 10090 | [H2-Q7, H2-Q6, H2-Eb1] | [H2-Q7, H2-Q6, H2-Eb1] | 1.900000e-06 | 1.300000e-03 | T cell receptor binding | 0.028182 | 2.027152 | 0.346516 |
| 104 | Function | GO:0042609 | 2 | 12 | 10090 | [Cd74, H2-Eb1] | [Cd74, H2-Eb1] | 4.550000e-05 | 2.030000e-02 | CD4 receptor binding | 0.010909 | 2.263241 | 0.335093 |
| 102 | Function | GO:0042287 | 3 | 52 | 10090 | [H2-Q7, Cd74, H2-Q6] | [H2-Q7, Cd74, H2-Q6] | 8.250000e-06 | 4.500000e-03 | MHC protein binding | 0.047273 | 1.802511 | 0.231485 |
| 103 | Function | GO:0016209 | 3 | 84 | 10090 | [S100a8, S100a9, Apoe] | [S100a8, S100a9, Apoe] | 3.290000e-05 | 1.610000e-02 | Antioxidant activity | 0.076364 | 1.594235 | 0.152916 |
| 99 | Function | GO:0042277 | 7 | 341 | 10090 | [H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1] | [H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1] | 2.470000e-09 | 3.030000e-06 | Peptide binding | 0.310000 | 1.353736 | 0.114579 |
| 101 | Function | GO:0044877 | 8 | 1602 | 10090 | [Fos, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6,... | [Fos, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6,... | 6.410000e-06 | 3.900000e-03 | Protein-containing complex binding | 1.456364 | 0.739820 | 0.028761 |
Inspecting Run History with RunInfo¶
Kompot tracks the history of all differential analysis runs, including parameters, environment, and results. The RunInfo class provides easy access to this information:
See kompot.RunInfo for full documentation.
[18]:
# Get info about the most recent DE run
de_run = kompot.RunInfo(adata, analysis_type='de')
de_run
[18]:
Run 0 (DE Analysis)
Run Summary
| Parameter | Value |
|---|---|
| conditions | Young to Old |
| obsm_key | DM_EigenVectors |
| uses_sample_variance | False |
| layer | logged_counts |
| timestamp | 2025-11-18T14:51:29.630913 |
| Fields Created | 9 |
All Parameters
| Parameter | Value |
|---|---|
| auto_filtered | False |
| batch_size | 0 |
| compute_mahalanobis | True |
| condition1 | Young |
| condition2 | Old |
| copy | False |
| eps | 1e-08 |
| fdr_threshold | 0.05 |
| groupby | Age |
| inplace | True |
| jit_compile | False |
| landmarks | False |
| layer | logged_counts |
| ls_factor | 10.0 |
| max_memory_ratio | 0.8 |
| min_cells | 2 |
| n_landmarks | 5000 |
| null_genes | 2000 |
| null_seed | 42 |
| obsm_key | DM_EigenVectors |
| result_key | kompot_de |
| sigma | 1.0 |
| store_landmarks | False |
| store_posterior_covariance | False |
| use_sample_variance | False |
| used_landmarks | False |
Environment
| Parameter | Value |
|---|---|
| hostname | gizmok39 |
| pid | 8452 |
| platform | Linux-4.15.0-213-generic-x86_64-with-glibc2.27 |
| python_version | 3.12.10 |
| timestamp | 2025-11-18T14:51:29.631101 |
| username | dotto |
Fields Created by This Run
| Field Name | Location | Description | Status |
|---|---|---|---|
| LAYERS Fields | |||
| kompot_de_Old_imputed | layers | [imputed] Imputed expression for Old | Present |
| kompot_de_Young_imputed | layers | [imputed] Imputed expression for Young | Present |
| kompot_de_Young_to_Old_fold_change | layers | [fold_change] Log fold change for each cell and gene | Present |
| OBS Fields | |||
| kompot_de_Old_std | obs | [std] Posterior standard deviation of imputed expression for Old (same for all genes) | Present |
| kompot_de_Young_std | obs | [std] Posterior standard deviation of imputed expression for Young (same for all genes) | Present |
| VAR Fields | |||
| kompot_de_Young_to_Old_is_de | var | [is_de] Boolean indicator of differential expression at local FDR < 0.05 | Present |
| kompot_de_Young_to_Old_mahalanobis | var | [mahalanobis] Mahalanobis distances | Present |
| kompot_de_Young_to_Old_mahalanobis_local_fdr | var | [mahalanobis_local_fdr] Local FDR values using empirical null estimation similar to R's fdrtool | Present |
| kompot_de_Young_to_Old_mean_lfc | var | [mean_log_fold_change] Mean log fold change values | Present |
Saving Results¶
Optional: Cleanup Large Layers¶
Imputed expression layers can be large. Remove them if not needed for further analysis with the cleanup utility:
[18]:
kompot.cleanup(adata)
[2025-10-03 13:26:49,187] [INFO ] Cleaning up all 1 run(s)
[2025-10-03 13:26:49,210] [INFO ] Cleaned up 3 field(s) from run 0:
[2025-10-03 13:26:49,211] [INFO ] layers (3 field(s)):
[2025-10-03 13:26:49,211] [INFO ] - kompot_de_Young_imputed
[2025-10-03 13:26:49,211] [INFO ] - kompot_de_Old_imputed
[2025-10-03 13:26:49,212] [INFO ] - kompot_de_Young_to_Old_fold_change
[2025-10-03 13:26:49,212] [INFO ] Total: Cleaned up 3 field(s) across 1 run(s)
This removes:
kompot_de_Young_imputedkompot_de_Old_imputedkompot_de_Young_to_Old_fold_change
Statistical results in adata.var and adata.obs are preserved.
[19]:
adata.write_h5ad("../data/murine_bone_marrow_aging_processed.h5ad")
Biological Interpretation¶
Key Findings¶
Differential Abundance:
HSCs show increased abundance in Old mice
Naive CD8 T cells are predominantly Young
Consistent with age-related HSC expansion and T cell depletion
Differential Expression:
MHC class II genes (H2-Q7, Cd74, H2-Aa, H2-Ab1): Upregulated in Old mice → enhanced antigen presentation
Antioxidant genes (S100a8, S100a9, Apoe): Higher in Young → reduced oxidative stress
Interferon-stimulated genes (Ifitm family): Age-related changes in immune response
These patterns suggest aging leads to chronic immune activation (“inflammaging”) and altered stem cell dynamics.
Next Steps¶
Tutorial 2: Differential Expression Deep Dive - Advanced DE analysis options
Tutorial 3: Sample Variance Analysis - Account for biological replicates
For complete documentation, visit kompot.readthedocs.io