Getting Started with Kompot

This tutorial introduces Kompot’s core functionality for differential analysis in single-cell data. You’ll learn how to:

  • Perform differential abundance (DA) analysis to identify cell states that change between conditions

  • Conduct differential expression (DE) analysis to find genes with altered expression

  • Visualize and interpret results using Kompot’s plotting tools

What Makes Kompot Different?

Kompot uses Mahalanobis distance to detect expression differences while accounting for the covariance structure of gene expression. This approach is particularly powerful for:

  • Detecting subtle changes along continuous cell state trajectories

  • Identifying genes with complex, coordinated expression patterns

  • Analyzing data where discrete cell type labels are inadequate

Dataset

We’ll analyze murine bone marrow cells comparing Young vs. Old mice to understand how aging affects hematopoietic stem cells and their derivatives.

[1]:
import anndata as ad
import matplotlib.pyplot as plt
import numpy as np
import palantir
import pandas as pd
import scanpy as sc
import seaborn as sns

import kompot

# Set plotting style
plt.rcParams["axes.spines.right"] = False
plt.rcParams["axes.spines.top"] = False
plt.rcParams["image.cmap"] = "Spectral_r"

Configuration

Define analysis parameters. Adapt these to your own data:

[2]:
DATA_PATH = "../data/murine_bone_marrow_aging.h5ad"
GROUPING_COLUMN = "Age"              # Condition column in adata.obs
CONDITIONS = ["Young", "Old"]        # First condition is reference
CELL_TYPE_COLUMN = "highres_celltype"  # Optional: for visualization only
DIMENSIONALITY_REDUCTION = "DM_EigenVectors"  # Cell state representation
LAYER_FOR_EXPRESSION = "logged_counts"        # Expression data layer

Load Data

The dataset will be downloaded automatically from Zenodo if not already present:

[3]:
import os
from pathlib import Path
import requests
from tqdm.auto import tqdm

Path(DATA_PATH).parent.mkdir(parents=True, exist_ok=True)

if not os.path.exists(DATA_PATH):
    print("Downloading dataset...")
    url = "https://zenodo.org/records/15587768/files/murine_bone_marrow_aging.h5ad?download=1"
    response = requests.get(url, stream=True)
    total = int(response.headers.get("content-length", 0))

    with open(DATA_PATH, "wb") as file, tqdm(total=total, unit="B", unit_scale=True) as bar:
        for chunk in response.iter_content(chunk_size=8192):
            file.write(chunk)
            bar.update(len(chunk))

adata = ad.read_h5ad(DATA_PATH)
adata
[3]:
AnnData object with n_obs × n_vars = 8090 × 16285
    obs: 'Compartment', 'Replicate', 'Age', 'Sample', 'Info', 'batch', 'doublet_score', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_hb', 'pct_counts_hb', 'S_score', 'G2M_score', 'phase', 'leiden', 'phenograph', 'highres_celltype', 'midres_celltype'
    var: 'gene_ids', 'feature_types', 'genome', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'hb', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'Age_colors', 'Compartment_colors', 'DMEigenValues', 'Info_colors', 'README', 'Replicate_colors', 'Sample_colors', 'batch_colors', 'draw_graph', 'highres_celltype_colors', 'hvg', 'leiden', 'leiden_colors', 'midres_celltype_colors', 'neighbors', 'pca', 'phase_colors', 'umap'
    obsm: 'AbCapture', 'DM_EigenVectors', 'HTO', 'X_draw_graph_fa', 'X_pca', 'X_pca_harmony', 'X_pca_noregression', 'X_umap'
    varm: 'PCs'
    layers: 'MAGIC_imputed_data', 'logged_counts', 'normalized_counts', 'raw_counts'
    obsp: 'DM_Kernel', 'connectivities', 'distances'

Data Exploration

Before analysis, examine the data structure and distribution:

[4]:
sc.pl.umap(adata, color=[CELL_TYPE_COLUMN, GROUPING_COLUMN], frameon=False, wspace=1.3)
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_7_0.png

If you have a clustering or cell-type annotation, cell type composition can provide a rough understanding of the data and help identify underrepresented cell types that could complicate differential expression analysis.

[5]:
crosstab = (
    pd.crosstab(
        adata.obs[CELL_TYPE_COLUMN], adata.obs[GROUPING_COLUMN], normalize="index"
    )
    * 100
)

# Plot the distribution
ax = crosstab.plot(kind="bar", stacked=False, figsize=(12, 8))
ax.grid(False)
plt.xlabel("Cell Type")
plt.ylabel("Percentage (%)")
plt.title("Cell Type Distribution by Condition")
plt.legend(title=GROUPING_COLUMN)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

# Print extremes (cell types with high bias toward one condition)
bias_threshold = 75  # Percentage threshold for considering a cell type biased
biased_types = crosstab[(crosstab > bias_threshold).any(axis=1)]

if not biased_types.empty:
    print(f"Cell types with >={bias_threshold}% bias toward one condition:")
    print(biased_types)
    print("\nThese cell types might show disproportionate changes between conditions.")
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_9_0.png
Cell types with >=75% bias toward one condition:
Age                     Mid        Old      Young
highres_celltype
HSC               22.257053  75.548589   2.194357
Naive CD8 T cell   8.000000   1.000000  91.000000

These cell types might show disproportionate changes between conditions.

Diffusion Maps Preprocessing

Kompot requires a continuous representation of cell states. Palantir diffusion maps capture the geometry of differentiation trajectories while reducing noise:

[6]:
palantir.utils.run_diffusion_maps(adata, pca_key="X_pca_harmony", n_components=40);

Differential Abundance Analysis

Identify cell states that change in frequency between conditions.

See kompot.compute_differential_abundance for full documentation.

[7]:
da_results = kompot.compute_differential_abundance(
    adata,
    groupby=GROUPING_COLUMN,
    condition1=CONDITIONS[0],
    condition2=CONDITIONS[1],
    obsm_key=DIMENSIONALITY_REDUCTION,
)
[2025-10-03 13:21:26,447] [INFO    ] Condition 1 (Young): 2,917 cells
[2025-10-03 13:21:26,448] [INFO    ] Condition 2 (Old): 3,116 cells
[2025-10-03 13:21:26,451] [INFO    ] Fitting density estimator for condition 1...
WARNING:2025-10-03 13:21:26,507:jax._src.xla_bridge:966: An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
[2025-10-03 13:22:13,343] [INFO    ] Fitting density estimator for condition 2...
[2025-10-03 13:22:30,146] [WARNING ] The normalization is only effective if the density was trained with d_method="fractal".
[2025-10-03 13:22:31,295] [WARNING ] The normalization is only effective if the density was trained with d_method="fractal".
[2025-10-03 13:22:42,501] [INFO    ] This run will have `run_id=0`.

Visualize Abundance Changes

Results are stored in adata.obs:

  • kompot_da_Young_to_Old_lfc: Log fold change (positive = enriched in Old)

  • kompot_da_Young_to_Old_lfc_direction: Categorical classification (up/down/unchanged)

[8]:
sc.pl.embedding(
    adata, "umap",
    color=[f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc_direction", f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc"],
    color_map="RdBu_r",
    vcenter=0,
    frameon=False,
)
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_15_0.png

Volcano Plot

Volcano plots for differential abundance show effect size (x-axis) vs. statistical significance (y-axis):

[9]:
kompot.plot.volcano_da(adata, color=CELL_TYPE_COLUMN)
[2025-10-03 13:22:43,131] [INFO    ] Found DA run info for run_id=-1
[2025-10-03 13:22:43,132] [INFO    ] Found lfc_key='kompot_da_Young_to_Old_lfc' from run info
[2025-10-03 13:22:43,132] [INFO    ] Found ptp_key='kompot_da_Young_to_Old_neg_log10_lfc_ptp' from run info
[2025-10-03 13:22:43,133] [INFO    ] Successfully inferred fields: {'lfc_key': 'kompot_da_Young_to_Old_lfc', 'ptp_key': 'kompot_da_Young_to_Old_neg_log10_lfc_ptp'}
[2025-10-03 13:22:43,134] [INFO    ] Using inferred thresholds - lfc_threshold: 1.0, ptp_threshold: 0.05
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_17_1.png

Points above the horizontal line and outside vertical lines are significantly changed.

Adjust thresholds using ptp_threshold and lfc_threshold parameters. Use update_direction=True to update the classification in adata.obs.

[10]:
# Visualize only significantly changed cells
kompot.plot.embedding(
    adata, "umap",
    color=[CELL_TYPE_COLUMN, f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc"],
    groups={f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc_direction": ["up", "down"]},
    frameon=False,
    wspace=.5,
)
[2025-11-18 14:50:16,345] [INFO    ] Selected 1,090 cells out of 8,090 total cells.
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_19_1.png

The mgroups parameter in the kompot.plot.embedding function allows plotting increasing and decreasing cell types separately:

[11]:
kompot.plot.embedding(
    adata,
    "umap",
    color=CELL_TYPE_COLUMN,
    mgroups=[{f"kompot_da_{CONDITIONS[0]}_to_{CONDITIONS[1]}_lfc_direction":[d]} for d in ["up", "down"]],
    frameon=False,
    wspace=.5,
)
[2025-11-18 14:50:17,070] [INFO    ] Selected 595 cells out of 8,090 total cells.
[2025-11-18 14:50:17,134] [INFO    ] Selected 495 cells out of 8,090 total cells.
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_21_1.png

Summary by Cell Type

Aggregate abundance changes by cell type with the direction_barplot:

[12]:
kompot.plot.direction_barplot(adata, category_column=CELL_TYPE_COLUMN)
[2025-10-03 13:22:45,749] [INFO    ] Found DA run info for run_id=-1
[2025-10-03 13:22:45,749] [INFO    ] Found direction_key='kompot_da_Young_to_Old_lfc_direction' from run info
[2025-10-03 13:22:45,750] [INFO    ] Successfully inferred fields: {'direction_key': 'kompot_da_Young_to_Old_lfc_direction'}
[2025-10-03 13:22:45,750] [INFO    ] Using DA run 0 for direction_barplot: comparing Young to Old
[2025-10-03 13:22:45,750] [INFO    ] Creating direction barplot: comparing Young to Old
[2025-10-03 13:22:45,751] [INFO    ] Using fields - category_column: 'highres_celltype', direction_column: 'kompot_da_Young_to_Old_lfc_direction'
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_23_1.png

Differential Expression Analysis

Identify genes that change in expression between conditions.

See kompot.compute_differential_expression for full documentation.

[13]:
de_results = kompot.compute_differential_expression(
    adata,
    groupby=GROUPING_COLUMN,
    condition1=CONDITIONS[0],
    condition2=CONDITIONS[1],
    layer=LAYER_FOR_EXPRESSION,
    obsm_key=DIMENSIONALITY_REDUCTION,
    batch_size=0 # set to, e.g., 100 to batch cells and genes for lower memory demand
)
[2025-10-03 13:22:46,599] [INFO    ] Condition 1 (Young): 2,917 cells
[2025-10-03 13:22:46,599] [INFO    ] Condition 2 (Old): 3,116 cells
[2025-10-03 13:22:46,600] [INFO    ] Using 8090 of 8090 cells (100.0%)
[2025-10-03 13:22:46,940] [INFO    ] Preparing null distribution with null_genes=2000, null_seed=42
[2025-10-03 13:22:48,822] [INFO    ] Generated shuffled expression for 2000 null genes
[2025-10-03 13:23:16,888] [INFO    ] Fitting expression estimator for condition 1...
[2025-10-03 13:23:46,013] [INFO    ] Fitting expression estimator for condition 2...
[2025-10-03 13:24:38,148] [INFO    ] Landmark storage skipped (store_landmarks=False). Compute with store_landmarks=True to enable landmark reuse.
[2025-10-03 13:25:48,824] [INFO    ] Using 5,000 landmarks for Mahalanobis computation
[2025-10-03 13:26:26,665] [INFO    ] Computing FDR statistics from null distribution
[2025-10-03 13:26:30,065] [INFO    ] FDR analysis complete: 393/16285 genes significantly DE at FDR < 0.05
[2025-10-03 13:26:30,066] [INFO    ] Mahalanobis distance threshold for FDR < 0.05: 23.1080
[2025-10-03 13:26:39,162] [INFO    ] This run will have `run_id=0`.

Resource Planning

For larger datasets or production workflows, you may want to optimize memory usage and computational resources. See the Resource Planning section in Tutorial 2 for more detailed guidance.

Results Interpretation

Results are stored in adata.var:

  • kompot_de_Young_to_Old_mean_lfc: Average log fold change

  • kompot_de_Young_to_Old_mahalanobis: Statistical significance (higher = more significant)

  • kompot_de_Young_to_Old_is_de: Boolean flag for significant genes (FDR < 0.05)

The Mahalanobis distance accounts for covariance structure and is more sensitive than simple fold change.

[14]:
# Top differentially expressed genes
adata.var.loc[
    :, adata.var.columns.str.contains("kompot_de")
].sort_values(
    f"kompot_de_{CONDITIONS[0]}_to_{CONDITIONS[1]}_mahalanobis", ascending=False
).head(20)
[14]:
kompot_de_Young_to_Old_mahalanobis kompot_de_Young_to_Old_mean_lfc kompot_de_Young_to_Old_mahalanobis_local_fdr kompot_de_Young_to_Old_is_de
H2-Q7 71.517322 1.235066 0.014441 True
Cd74 60.021451 0.352121 0.014441 True
H2-Aa 59.371135 0.473756 0.014441 True
H2-Ab1 57.808982 0.511123 0.014441 True
Igkc 53.604111 0.054009 0.014441 True
H2-Eb1 53.403829 0.411752 0.014441 True
AW112010 53.386234 0.791367 0.014441 True
S100a9 48.765184 -0.170317 0.014441 True
Ifitm3 47.585812 0.367545 0.014441 True
S100a8 47.372815 -0.165121 0.014441 True
H2-Q6 45.991623 0.782279 0.014441 True
Cd52 45.309906 -0.127807 0.014441 True
Aldh1a1 44.596577 0.490811 0.014441 True
Ifitm1 44.320577 0.427663 0.014441 True
Ifitm2 43.940652 0.285836 0.014441 True
Ighm 43.216410 -0.173726 0.014441 True
Cd79a 43.107791 -0.004329 0.014441 True
Apoe 42.935707 -0.131211 0.014441 True
Fos 42.282846 0.531571 0.014441 True
Gm47283 41.956500 0.627173 0.014441 True

Volcano Plot

Visualize effect size vs. significance for differential expression with the volcano_de plot:

[15]:
kompot.plot.volcano_de(adata)
[2025-10-03 13:26:40,415] [INFO    ] Found DE run info for run_id=-1
[2025-10-03 13:26:40,416] [INFO    ] Found mean_lfc_key='kompot_de_Young_to_Old_mean_lfc' from run info
[2025-10-03 13:26:40,417] [INFO    ] Found mahalanobis_key='kompot_de_Young_to_Old_mahalanobis' from run info
[2025-10-03 13:26:40,417] [INFO    ] Successfully inferred fields: {'mean_lfc_key': 'kompot_de_Young_to_Old_mean_lfc', 'mahalanobis_key': 'kompot_de_Young_to_Old_mahalanobis'}
[2025-10-03 13:26:40,418] [INFO    ] Using DE run 0: comparing Young to Old
[2025-10-03 13:26:40,431] [INFO    ] Using data columns from var - lfc: 'kompot_de_Young_to_Old_mean_lfc', score: 'kompot_de_Young_to_Old_mahalanobis'
[2025-10-03 13:26:40,457] [INFO    ] Highlighting 393 genes marked as DE (232 up, 161 down)
[2025-10-03 13:26:40,477] [INFO    ] Labeling top 10 genes by score
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_30_1.png

Fold Change Heatmap

Visualize fold changes across top differentially expressed genes with a heatmap. This provides a complementary view to the volcano plot, showing the magnitude and direction of expression changes.

Note that the gene selection is based on Kompot’s Mahalanobis distance (statistical significance), but the fold changes displayed are simply the difference of mean expressions from the input expression layer (in this case logged_counts), not a Kompot-specific metric:

[17]:
# Selecting top 20 genes
genes = adata.var[f"kompot_de_{CONDITIONS[0]}_to_{CONDITIONS[1]}_mahalanobis"].sort_values(ascending=False).head(20).index

kompot.plot.heatmap(
    adata,
    genes=genes,
    groupby=CELL_TYPE_COLUMN,        # Aggregate expression by cell type
    exclude_groups="Plasma cell",    # Remove cell types with too little representation
    vmin="p1",                       # Color scale minimum at 1st percentile (handles outliers)
    vmax="p99",                      # Color scale maximum at 99th percentile
    fold_change_mode=True,           # Display fold changes instead of mean expression
)
[2025-11-29 18:24:53,333] [INFO    ] Inferred condition_column='Age' from run information
[2025-11-29 18:24:53,334] [INFO    ] Inferred condition1='Young' from run information
[2025-11-29 18:24:53,335] [INFO    ] Inferred condition2='Old' from run information
[2025-11-29 18:24:53,336] [INFO    ] Inferred layer='logged_counts' from run information
[2025-11-29 18:24:53,336] [INFO    ] Creating fold change heatmap with 20 genes/features
[2025-11-29 18:24:53,338] [INFO    ] Using expression data from layer: 'logged_counts'
[2025-11-29 18:24:53,452] [INFO    ] Excluded 7 cells from groups: Plasma cell
[2025-11-29 18:24:53,458] [INFO    ] Applying gene-wise z-scoring (standard_scale='var')
[2025-11-29 18:24:53,524] [WARNING ] standard_scale is ignored in fold_change_mode as z-scoring is not appropriate for fold changes
../../_build/doctrees/nbsphinx/notebooks_01_getting_started_32_1.png

Functional Enrichment

Use StringDBReport to analyze gene sets using STRING database:

Privacy Note: This sends gene lists to the STRING database API. If working with sensitive data, consider local alternatives.

[16]:
# Selecting top 20 genes
gene = adata.var[f"kompot_de_{CONDITIONS[0]}_to_{CONDITIONS[1]}_mahalanobis"].sort_values(ascending=False).head(20).index

# Create STRING report (10090 = Mus musculus, 9606 = Homo Sapiens)
report = kompot.plot.StringDBReport(
    gene,
    species_id=10090,
    include_enrichment=True
)
report
[16]:

Gene Set Report: 20 genes

Species: Mus musculus (Taxonomy ID: 10090)

StringDB Network

View interactive network in StringDB

StringDB Network
Resource Links (20 genes)
GeneResource Links
H2-Q7STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
Cd74STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
H2-AaSTRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
H2-Ab1STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
IgkcSTRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
H2-Eb1STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
AW112010STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
S100a9STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
Ifitm3STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
S100a8STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
H2-Q6STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
Cd52STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
Aldh1a1STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
Ifitm1STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
Ifitm2STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
IghmSTRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
Cd79aSTRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
ApoeSTRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
FosSTRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene
Gm47283STRING DB | BioGRID | Reactome | GeneCards | UniProt | MGI | NCBI Gene

Functional Enrichment Analysis

View interactive enrichment analysis on StringDB

Gene Ontology Processes (37 terms)
term description signal strength fdr number_of_genes inputGenes
GO:0070488 Neutrophil aggregation 0.573343 2.865301 1.800000e-03 2 [S100a8, S100a9]
GO:0002503 Peptide antigen assembly with MHC class II protein complex 0.517595 2.518514 6.920000e-05 3 [H2-Ab1, H2-Aa, H2-Eb1]
GO:0019886 Antigen processing and presentation of exogenous peptide antigen via MHC class II 0.476370 2.388180 2.000000e-06 4 [H2-Ab1, H2-Aa, Cd74, H2-Eb1]
GO:0018119 Peptidyl-cysteine S-nitrosylation 0.369195 2.497325 6.100000e-03 2 [S100a8, S100a9]
GO:0035425 Autocrine signaling 0.342164 2.439333 7.100000e-03 2 [S100a8, S100a9]
GO:0046597 Negative regulation of viral entry into host cell 0.290538 2.124939 1.160000e-05 4 [Ifitm3, Ifitm2, Cd74, Ifitm1]
GO:0060337 Type I interferon signaling pathway 0.246167 2.087150 6.100000e-04 3 [Ifitm3, Ifitm2, Ifitm1]
GO:0002579 Positive regulation of antigen processing and presentation 0.235977 2.196295 1.610000e-02 2 [H2-Ab1, Cd74]
GO:0048002 Antigen processing and presentation of peptide antigen 0.230769 1.980695 1.480000e-07 6 [H2-Ab1, H2-Aa, H2-Q7, Cd74, H2-Q6, H2-Eb1]
GO:0035455 Response to interferon-alpha 0.198291 1.974446 1.100000e-03 3 [Ifitm3, Ifitm2, Ifitm1]
GO:0002523 Leukocyte migration involved in inflammatory response 0.137714 1.895265 4.830000e-02 2 [S100a8, S100a9]
GO:0002474 Antigen processing and presentation of peptide antigen via MHC class I 0.137714 1.895265 4.830000e-02 2 [H2-Q7, H2-Q6]
GO:0034341 Response to interferon-gamma 0.130841 1.710399 1.480000e-07 7 [Ifitm3, H2-Ab1, H2-Aa, Ifitm2, H2-Q7, Cd74, Ifitm1]
GO:0045071 Negative regulation of viral genome replication 0.125120 1.747662 4.200000e-03 3 [Ifitm3, Ifitm2, Ifitm1]
GO:0035456 Response to interferon-beta 0.108594 1.679665 6.100000e-03 3 [Ifitm3, Ifitm2, Ifitm1]
GO:1990748 Cellular detoxification 0.096972 1.598130 6.100000e-04 4 [S100a8, S100a9, Apoe, Aldh1a1]
GO:0009636 Response to toxic substance 0.059109 1.361965 3.400000e-04 5 [Fos, S100a8, S100a9, Apoe, Aldh1a1]
GO:0050870 Positive regulation of T cell activation 0.045165 1.252518 8.300000e-03 4 [H2-Ab1, H2-Aa, Cd74, H2-Eb1]
GO:0002250 Adaptive immune response 0.042766 1.217918 1.300000e-03 5 [Cd79a, H2-Ab1, H2-Aa, Cd74, H2-Eb1]
GO:0045087 Innate immune response 0.035370 1.121734 9.980000e-07 9 [Ifitm3, H2-Ab1, H2-Aa, S100a8, Ifitm2, H2-Q7, Cd74, Ifitm1, S100a9]

Showing 20 of 37 enriched terms

KEGG Pathways (28 terms)
term description signal strength fdr number_of_genes inputGenes
mmu05332 Graft-versus-host disease 0.835879 2.041393 5.950000e-08 5 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
mmu05330 Allograft rejection 0.827236 2.032793 5.950000e-08 5 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
mmu04612 Antigen processing and presentation 0.780488 1.944483 5.540000e-09 6 [H2-Ab1, H2-Aa, H2-Q7, Cd74, H2-Q6, H2-Eb1]
mmu04940 Type I diabetes mellitus 0.771263 1.976935 5.990000e-08 5 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
mmu05310 Asthma 0.726704 2.138303 1.940000e-05 3 [H2-Ab1, H2-Aa, H2-Eb1]
mmu05320 Autoimmune thyroid disease 0.701568 1.914288 9.500000e-08 5 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
mmu05416 Viral myocarditis 0.656329 1.871131 1.270000e-07 5 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
mmu05140 Leishmaniasis 0.550054 1.823909 5.660000e-06 4 [Fos, H2-Ab1, H2-Aa, H2-Eb1]
mmu04672 Intestinal immune network for IgA production 0.536462 1.905730 8.170000e-05 3 [H2-Ab1, H2-Aa, H2-Eb1]
mmu05323 Rheumatoid arthritis 0.466192 1.719173 1.100000e-05 4 [Fos, H2-Ab1, H2-Aa, H2-Eb1]
mmu04658 Th1 and Th2 cell differentiation 0.459220 1.708954 1.120000e-05 4 [Fos, H2-Ab1, H2-Aa, H2-Eb1]
mmu05321 Inflammatory bowel disease 0.420401 1.740363 2.200000e-04 3 [H2-Ab1, H2-Aa, H2-Eb1]
mmu04659 Th17 cell differentiation 0.407465 1.639131 1.940000e-05 4 [Fos, H2-Ab1, H2-Aa, H2-Eb1]
mmu04514 Cell adhesion molecules 0.367600 1.544463 3.570000e-06 5 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
mmu04662 B cell receptor signaling pathway 0.363811 1.649282 3.800000e-04 3 [Cd79a, Fos, Ifitm1]
mmu04145 Phagosome 0.359086 1.530848 3.690000e-06 5 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
mmu05166 Human T-cell leukemia virus 1 infection 0.325142 1.454056 5.430000e-07 6 [Fos, H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
mmu04657 IL-17 signaling pathway 0.318638 1.569124 6.100000e-04 3 [Fos, S100a8, S100a9]
mmu04640 Hematopoietic cell lineage 0.316537 1.564271 6.100000e-04 3 [H2-Ab1, H2-Aa, H2-Eb1]
mmu05322 Systemic lupus erythematosus 0.309591 1.550031 6.400000e-04 3 [H2-Ab1, H2-Aa, H2-Eb1]

Showing 20 of 28 enriched terms

Gene Ontology Functions (8 terms)
term description signal strength fdr number_of_genes inputGenes
GO:0023026 MHC class II protein complex binding 0.949472 2.564271 8.500000e-07 4 [H2-Ab1, H2-Aa, Cd74, H2-Eb1]
GO:0042605 Peptide antigen binding 0.652174 2.249001 1.920000e-07 5 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1]
GO:0042608 T cell receptor binding 0.346516 2.027152 1.300000e-03 3 [H2-Q7, H2-Q6, H2-Eb1]
GO:0042609 CD4 receptor binding 0.335093 2.263241 2.030000e-02 2 [Cd74, H2-Eb1]
GO:0042287 MHC protein binding 0.231485 1.802511 4.500000e-03 3 [H2-Q7, Cd74, H2-Q6]
GO:0016209 Antioxidant activity 0.152916 1.594235 1.610000e-02 3 [S100a8, S100a9, Apoe]
GO:0042277 Peptide binding 0.114579 1.353736 3.030000e-06 7 [H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1]
GO:0044877 Protein-containing complex binding 0.028761 0.739820 3.900000e-03 8 [Fos, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1]
Gene Ontology Components (25 terms)
term description signal strength fdr number_of_genes inputGenes
GO:0042613 MHC class II protein complex 0.847811 2.602060 1.240000e-07 4 [H2-Ab1, H2-Aa, Cd74, H2-Eb1]
GO:0042611 MHC protein complex 0.653465 2.288065 4.110000e-10 6 [H2-Ab1, H2-Aa, H2-Q7, Cd74, H2-Q6, H2-Eb1]
GO:0071556 Integral component of lumenal side of endoplasmic reticulum membrane 0.280364 2.138303 6.000000e-03 2 [H2-Q7, H2-Q6]
GO:0032398 MHC class Ib protein complex 0.230887 2.000000 9.700000e-03 2 [H2-Q7, H2-Q6]
GO:0042612 MHC class I protein complex 0.219137 1.962211 1.070000e-02 2 [H2-Q7, H2-Q6]
GO:0033106 cis-Golgi network membrane 0.193032 1.880025 1.430000e-02 2 [H2-Q7, H2-Q6]
GO:0005771 Multivesicular body 0.162256 1.655191 2.000000e-03 3 [Cd79a, H2-Ab1, Cd74]
GO:0005770 Late endosome 0.119764 1.418143 2.640000e-07 7 [Cd79a, Ifitm3, H2-Ab1, Ifitm2, Cd74, Apoe, H2-Eb1]
GO:0030670 Phagocytic vesicle membrane 0.115688 1.586548 4.410000e-02 2 [H2-Q7, H2-Q6]
GO:0005765 Lysosomal membrane 0.107614 1.377064 5.980000e-06 6 [Ifitm3, Ifitm2, H2-Q7, Ifitm1, H2-Q6, H2-Eb1]
GO:0031902 Late endosome membrane 0.088021 1.348252 1.090000e-02 3 [Ifitm3, Ifitm2, H2-Eb1]
GO:0005764 Lysosome 0.082485 1.241287 5.820000e-08 9 [Ifitm3, H2-Aa, Ifitm2, H2-Q7, Cd74, Ifitm1, Apoe, H2-Q6, H2-Eb1]
GO:0009897 External side of plasma membrane 0.080203 1.230992 2.930000e-07 8 [Cd79a, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1]
GO:0031901 Early endosome membrane 0.073535 1.263241 1.800000e-02 3 [Ifitm3, H2-Q7, H2-Q6]
GO:0005769 Early endosome 0.071391 1.190134 5.330000e-05 6 [Ifitm3, H2-Ab1, H2-Q7, Ifitm1, Apoe, H2-Q6]
GO:0098797 Plasma membrane protein complex 0.061461 1.116375 1.480000e-05 7 [Cd79a, H2-Ab1, H2-Aa, H2-Q7, Cd74, H2-Q6, H2-Eb1]
GO:0010008 Endosome membrane 0.054324 1.076662 1.400000e-03 5 [Ifitm3, Ifitm2, H2-Q7, H2-Q6, H2-Eb1]
GO:0005768 Endosome 0.053483 1.045319 1.240000e-07 10 [Cd79a, Ifitm3, H2-Ab1, Ifitm2, H2-Q7, Cd74, Ifitm1, Apoe, H2-Q6, H2-Eb1]
GO:0009986 Cell surface 0.044503 0.966251 3.210000e-06 9 [Cd79a, Ifitm3, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1]
GO:0012505 Endomembrane system 0.013531 0.443299 2.800000e-03 11 [Cd79a, Fos, Ifitm3, H2-Ab1, Ifitm2, H2-Q7, Cd74, Ifitm1, Apoe, H2-Q6, H2-Eb1]

Showing 20 of 25 enriched terms

Reactome Pathways (7 terms)
term description signal strength fdr number_of_genes inputGenes
MMU-6799990 Metal sequestration by antimicrobial proteins 0.728901 2.740363 0.003200 2 [S100a8, S100a9]
MMU-5686938 Regulation of TLR by endogenous ligand 0.247582 2.041393 0.038600 2 [S100a8, S100a9]
MMU-5668599 RHO GTPases Activate NADPH Oxidases 0.217832 1.962211 0.042900 2 [S100a8, S100a9]
MMU-198933 Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell 0.142132 1.634853 0.000510 4 [Ifitm3, Ifitm2, H2-Q7, Ifitm1]
MMU-168898 Toll-like Receptor Cascades 0.074034 1.363178 0.042900 3 [Fos, S100a8, S100a9]
MMU-1280218 Adaptive Immune System 0.039783 1.049534 0.000240 7 [Cd79a, Ifitm3, Ifitm2, H2-Q7, Cd74, Ifitm1, H2-Eb1]
MMU-168256 Immune System 0.024465 0.833220 0.000045 10 [Cd79a, Fos, Ifitm3, S100a8, Ifitm2, H2-Q7, Cd74, Ifitm1, S100a9, H2-Eb1]
[17]:
# Get enriched functional categories
report.get_functional_enrichment("Function").head(10)
[17]:
category term number_of_genes number_of_genes_in_background ncbiTaxonId inputGenes preferredNames p_value fdr description expected strength signal
98 Function GO:0023026 4 12 10090 [H2-Ab1, H2-Aa, Cd74, H2-Eb1] [H2-Ab1, H2-Aa, Cd74, H2-Eb1] 3.470000e-10 8.500000e-07 MHC class II protein complex binding 0.010909 2.564271 0.949472
97 Function GO:0042605 5 31 10090 [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] [H2-Ab1, H2-Aa, H2-Q7, H2-Q6, H2-Eb1] 3.910000e-11 1.920000e-07 Peptide antigen binding 0.028182 2.249001 0.652174
100 Function GO:0042608 3 31 10090 [H2-Q7, H2-Q6, H2-Eb1] [H2-Q7, H2-Q6, H2-Eb1] 1.900000e-06 1.300000e-03 T cell receptor binding 0.028182 2.027152 0.346516
104 Function GO:0042609 2 12 10090 [Cd74, H2-Eb1] [Cd74, H2-Eb1] 4.550000e-05 2.030000e-02 CD4 receptor binding 0.010909 2.263241 0.335093
102 Function GO:0042287 3 52 10090 [H2-Q7, Cd74, H2-Q6] [H2-Q7, Cd74, H2-Q6] 8.250000e-06 4.500000e-03 MHC protein binding 0.047273 1.802511 0.231485
103 Function GO:0016209 3 84 10090 [S100a8, S100a9, Apoe] [S100a8, S100a9, Apoe] 3.290000e-05 1.610000e-02 Antioxidant activity 0.076364 1.594235 0.152916
99 Function GO:0042277 7 341 10090 [H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1] [H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6, H2-Eb1] 2.470000e-09 3.030000e-06 Peptide binding 0.310000 1.353736 0.114579
101 Function GO:0044877 8 1602 10090 [Fos, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6,... [Fos, H2-Ab1, H2-Aa, H2-Q7, Cd74, Apoe, H2-Q6,... 6.410000e-06 3.900000e-03 Protein-containing complex binding 1.456364 0.739820 0.028761

Inspecting Run History with RunInfo

Kompot tracks the history of all differential analysis runs, including parameters, environment, and results. The RunInfo class provides easy access to this information:

See kompot.RunInfo for full documentation.

[18]:
# Get info about the most recent DE run
de_run = kompot.RunInfo(adata, analysis_type='de')
de_run
[18]:

Run 0 (DE Analysis)

Run Summary
Analysis: DE  |  Run ID: 0  |  Timestamp: 2025-11-18T14:51:29.630913  |  Conditions: Young to OldAll Fields Present
ParameterValue
conditionsYoung to Old
obsm_keyDM_EigenVectors
uses_sample_varianceFalse
layerlogged_counts
timestamp2025-11-18T14:51:29.630913
Fields Created9
All Parameters
ParameterValue
auto_filteredFalse
batch_size0
compute_mahalanobisTrue
condition1Young
condition2Old
copyFalse
eps1e-08
fdr_threshold0.05
groupbyAge
inplaceTrue
jit_compileFalse
landmarksFalse
layerlogged_counts
ls_factor10.0
max_memory_ratio0.8
min_cells2
n_landmarks5000
null_genes2000
null_seed42
obsm_keyDM_EigenVectors
result_keykompot_de
sigma1.0
store_landmarksFalse
store_posterior_covarianceFalse
use_sample_varianceFalse
used_landmarksFalse
Environment
ParameterValue
hostnamegizmok39
pid8452
platformLinux-4.15.0-213-generic-x86_64-with-glibc2.27
python_version3.12.10
timestamp2025-11-18T14:51:29.631101
usernamedotto
Fields Created by This Run
Total Fields: 9  |  Present: 9  |  Missing: 0  |  Overwritten: 0
Field NameLocationDescriptionStatus
LAYERS Fields
kompot_de_Old_imputedlayers[imputed] Imputed expression for OldPresent
kompot_de_Young_imputedlayers[imputed] Imputed expression for YoungPresent
kompot_de_Young_to_Old_fold_changelayers[fold_change] Log fold change for each cell and genePresent
OBS Fields
kompot_de_Old_stdobs[std] Posterior standard deviation of imputed expression for Old (same for all genes)Present
kompot_de_Young_stdobs[std] Posterior standard deviation of imputed expression for Young (same for all genes)Present
VAR Fields
kompot_de_Young_to_Old_is_devar[is_de] Boolean indicator of differential expression at local FDR < 0.05Present
kompot_de_Young_to_Old_mahalanobisvar[mahalanobis] Mahalanobis distancesPresent
kompot_de_Young_to_Old_mahalanobis_local_fdrvar[mahalanobis_local_fdr] Local FDR values using empirical null estimation similar to R's fdrtoolPresent
kompot_de_Young_to_Old_mean_lfcvar[mean_log_fold_change] Mean log fold change valuesPresent

Saving Results

Optional: Cleanup Large Layers

Imputed expression layers can be large. Remove them if not needed for further analysis with the cleanup utility:

[18]:
kompot.cleanup(adata)
[2025-10-03 13:26:49,187] [INFO    ] Cleaning up all 1 run(s)
[2025-10-03 13:26:49,210] [INFO    ] Cleaned up 3 field(s) from run 0:
[2025-10-03 13:26:49,211] [INFO    ]   layers (3 field(s)):
[2025-10-03 13:26:49,211] [INFO    ]     - kompot_de_Young_imputed
[2025-10-03 13:26:49,211] [INFO    ]     - kompot_de_Old_imputed
[2025-10-03 13:26:49,212] [INFO    ]     - kompot_de_Young_to_Old_fold_change
[2025-10-03 13:26:49,212] [INFO    ] Total: Cleaned up 3 field(s) across 1 run(s)

This removes:

  • kompot_de_Young_imputed

  • kompot_de_Old_imputed

  • kompot_de_Young_to_Old_fold_change

Statistical results in adata.var and adata.obs are preserved.

[19]:
adata.write_h5ad("../data/murine_bone_marrow_aging_processed.h5ad")

Biological Interpretation

Key Findings

Differential Abundance:

  • HSCs show increased abundance in Old mice

  • Naive CD8 T cells are predominantly Young

  • Consistent with age-related HSC expansion and T cell depletion

Differential Expression:

  • MHC class II genes (H2-Q7, Cd74, H2-Aa, H2-Ab1): Upregulated in Old mice → enhanced antigen presentation

  • Antioxidant genes (S100a8, S100a9, Apoe): Higher in Young → reduced oxidative stress

  • Interferon-stimulated genes (Ifitm family): Age-related changes in immune response

These patterns suggest aging leads to chronic immune activation (“inflammaging”) and altered stem cell dynamics.

Next Steps

For complete documentation, visit kompot.readthedocs.io