Command-Line Interface (CLI)¶

The kompot CLI provides command-line access to differential expression (DE) and differential abundance (DA) analysis for pipeline integration and workflow automation.

Installation¶

The CLI is installed automatically with kompot:

pip install kompot
# or
mamba install -c bioconda kompot

Verify installation:

kompot --version
kompot --help

Overview¶

The CLI provides three main commands:

kompot dm - Compute diffusion maps (preprocessing with Palantir)
kompot de - Differential expression analysis
kompot da - Differential abundance analysis

All commands support:

Direct CLI arguments for common parameters
YAML/JSON config files for advanced parameters
Reading/writing .h5ad and .zarr AnnData formats

Quick Start¶

Complete Workflow¶

# 1. Compute diffusion maps (preprocessing)
kompot dm input.h5ad -o input_with_dm.h5ad \\
  --pca-key X_pca \\
  --n-components 10

# 2. Run differential expression
kompot de input_with_dm.h5ad -o de_results.h5ad \\
  --groupby condition \\
  --condition1 control \\
  --condition2 treatment \\
  --obsm-key DM_EigenVectors

# 3. Run differential abundance
kompot da input_with_dm.h5ad -o da_results.h5ad \\
  --groupby condition \\
  --condition1 control \\
  --condition2 treatment \\
  --obsm-key DM_EigenVectors

Diffusion Maps (Preprocessing)¶

kompot dm input.h5ad -o output.h5ad \\
  --pca-key X_pca \\
  --n-components 10 \\
  --knn 30

Differential Expression (Basic)¶

kompot de input.h5ad -o output.h5ad \\
  --groupby condition \\
  --condition1 control \\
  --condition2 treatment \\
  --obsm-key X_pca \\
  --layer logged_counts

Differential Abundance (Basic)¶

kompot da input.h5ad -o output.h5ad \\
  --groupby condition \\
  --condition1 control \\
  --condition2 treatment \\
  --obsm-key X_pca

Using Config Files¶

For complex analyses with many parameters, use config files:

# Get template (copy from installed package)
python -c "from pathlib import Path; import shutil; \\
import kompot; \\
src = Path(kompot.__file__).parent / 'cli' / 'templates' / 'de_config_minimal.yaml'; \\
shutil.copy(src, 'my_de_config.yaml')"

# Edit config file
nano my_de_config.yaml

# Run analysis
kompot de input.h5ad -o output.h5ad -c my_de_config.yaml

CLI arguments override config file values:

kompot de input.h5ad -o output.h5ad \\
  -c my_config.yaml \\
  --batch-size 50  # Overrides batch_size in config

Diffusion Maps Command¶

The dm command computes diffusion maps using Palantir, which provides a continuous representation of cell states needed for differential analysis.

Basic Usage¶

kompot dm INPUT -o OUTPUT [OPTIONS]

Prerequisites¶

Requires Palantir: pip install palantir or pip install kompot[recommended]
Input AnnData must contain PCA coordinates in adata.obsm

Common Options¶

--pca-key KEY           # PCA coordinates in adata.obsm (default: X_pca)
--n-components N        # Number of diffusion components (default: 10)
--knn N                 # Number of nearest neighbors (default: 30)
--alpha FLOAT           # Diffusion alpha parameter (default: 0)

Output¶

Results are stored in:

adata.obsm['DM_EigenVectors'] - Diffusion map coordinates (n_cells × n_components)
adata.uns['DM_EigenValues'] - Eigenvalues of diffusion operator

Example: Complete Preprocessing¶

# Starting with raw AnnData (assuming PCA already computed)
kompot dm bone_marrow.h5ad -o bone_marrow_dm.h5ad \\
  --pca-key X_pca \\
  --n-components 10 \\
  --knn 30

# Then run differential analysis
kompot de bone_marrow_dm.h5ad -o results.h5ad \\
  --groupby Age \\
  --condition1 Young \\
  --condition2 Old \\
  --obsm-key DM_EigenVectors

Why Diffusion Maps?¶

Diffusion maps capture continuous cell state transitions better than PCA alone:

Preserves the geometry of differentiation trajectories
Reduces noise while maintaining biological structure
Euclidean distance in this representation better represents biological similarity
Distance in cell-state representation is used by kompot’s covariance kernel

See the Palantir documentation for details.

Differential Expression Command¶

Basic Usage¶

kompot de INPUT -o OUTPUT [OPTIONS]
kompot de INPUT -t TABLE_OUTPUT [OPTIONS]

At least one output must be specified: -o/--output for full AnnData or -t/--table-output for CSV/TSV table.

Required Parameters¶

Either via CLI or config file:

--groupby COLUMN - Column in adata.obs with condition labels
--condition1 LABEL - Reference condition label
--condition2 LABEL - Comparison condition label

Output Options¶

-o, --output FILE         # Output AnnData file (.h5ad or .zarr)
-t, --table-output FILE   # Output DE results as table (.csv or .tsv)

The --table-output option exports only the kompot-produced columns from adata.var (gene-level statistics like mahalanobis distance, log fold change, FDR, etc.). This is useful for downstream analysis or integration with other tools.

Common Options¶

--obsm-key KEY            # Cell state representation (default: DM_EigenVectors)
--layer LAYER             # Expression data layer (default: None, use X)
--result-key KEY          # Storage key (default: kompot_de)
--n-landmarks N           # Number of landmarks (default: 5000)
--sample-col COLUMN       # Sample ID column for replicates
--batch-size N            # Cells per batch (default: 100)
--fdr-threshold FLOAT     # FDR threshold (default: 0.05)
--null-genes N            # Null genes for FDR (default: 2000)

Boolean Flags¶

--no-progress             # Disable progress bars
--store-landmarks           # Store landmarks for reuse
--store-additional-stats    # Store extra statistics
--use-empirical-variance   # Estimate per-gene noise from GP residuals
--overwrite                 # Overwrite without warning

Compute Options¶

--use-gpu                 # Use GPU acceleration (requires CUDA-enabled JAX)
--threads N               # Number of threads for JAX/NumPy/Dask (default: all cores)

Advanced Options¶

For advanced parameters (gene filtering, cell filtering, GP kernel parameters, memory management, etc.), see the configuration file templates:

kompot/cli/templates/de_config_template.yaml - Complete template with all parameters
kompot/cli/templates/de_config_minimal.yaml - Minimal template with common parameters

Example: Complete Analysis¶

kompot de bone_marrow.h5ad -o results.h5ad \\
  --groupby Age \\
  --condition1 Young \\
  --condition2 Old \\
  --obsm-key DM_EigenVectors \\
  --layer logged_counts \\
  --sample-col Sample \\
  --n-landmarks 5000 \\
  --batch-size 100 \\
  --fdr-threshold 0.05 \\
  --null-genes 2000 \\
  --store-additional-stats

Differential Abundance Command¶

Basic Usage¶

kompot da INPUT -o OUTPUT [OPTIONS]
kompot da INPUT -t TABLE_OUTPUT [OPTIONS]

At least one output must be specified: -o/--output for full AnnData or -t/--table-output for CSV/TSV table.

Required Parameters¶

Either via CLI or config file:

--groupby COLUMN - Column in adata.obs with condition labels
--condition1 LABEL - Reference condition label
--condition2 LABEL - Comparison condition label

Output Options¶

-o, --output FILE         # Output AnnData file (.h5ad or .zarr)
-t, --table-output FILE   # Output DA results as table (.csv or .tsv)

The --table-output option exports only the kompot-produced columns from adata.obs (cell-level statistics like log fold change, z-scores, PTP values, etc.). This is useful for downstream analysis or integration with other tools.

Common Options¶

--obsm-key KEY                    # Cell state representation (default: X_pca)
--result-key KEY                  # Storage key (default: kompot_da)
--n-landmarks N                   # Number of landmarks (default: None, all points)
--sample-col COLUMN               # Sample ID column for replicates
--batch-size N                    # Cells per batch (default: None)
--log-fold-change-threshold FLOAT # LFC threshold (default: 1.0)
--ptp-threshold FLOAT             # PTP threshold (default: 0.05)
--ls-factor FLOAT                 # Length scale factor (default: 10.0)

Boolean Flags¶

--store-landmarks         # Store landmarks for reuse
--overwrite               # Overwrite without warning
--no-progress             # Disable progress bars

Compute Options¶

--use-gpu                 # Use GPU acceleration (requires CUDA-enabled JAX)
--threads N               # Number of threads for JAX/NumPy/Dask (default: all cores)

Example: Complete Analysis¶

kompot da bone_marrow.h5ad -o results.h5ad \\
  --groupby Age \\
  --condition1 Young \\
  --condition2 Old \\
  --obsm-key DM_EigenVectors \\
  --sample-col Sample \\
  --n-landmarks 3000 \\
  --log-fold-change-threshold 1.0 \\
  --ptp-threshold 0.05

Configuration Files¶

Mapping to the Python API¶

The Python API groups parameters into Settings dataclasses (GPSettings, FDRSettings, etc.). CLI config files use flat keys — the CLI maps them to the correct Settings automatically. The table below shows how config keys correspond to Python Settings:

Config key	Python equivalent	Description
`sigma`, `ls`, `ls_factor`, `n_landmarks`, `batch_size`, `eps`, `jit_compile`, `random_state`, `use_empirical_variance`	`gp=GPSettings(...)`	GP model parameters
`null_genes`, `null_seed`, `fdr_threshold`	`fdr=FDRSettings(...)`	FDR / null distribution (DE)
`log_fold_change_threshold`, `ptp_threshold`	`threshold=DAThresholdSettings(...)`	Significance thresholds (DA)
`cell_filter`, `groups`, `min_cells`, `min_percentage`, `check_representation`	`filter=FilterSettings(...)`	Cell / group filtering (DE)
`result_key`, `overwrite`, `store_landmarks`, `store_arrays_on_disk`, `disk_storage_dir`, `max_memory_ratio`, `store_posterior_covariance`, `store_additional_stats`	`storage=StorageSettings(...)`	Result storage
`copy`, `inplace`, `progress`, `compute_mahalanobis`, `allow_single_condition_variance`	`output=OutputSettings(...)`	Output control

For example, this config file:

sigma: 0.5
n_landmarks: 3000
fdr_threshold: 0.01

is equivalent to:

kompot.de(adata, ...,
    gp=GPSettings(sigma=0.5, n_landmarks=3000),
    fdr=FDRSettings(threshold=0.01))

YAML Format¶

Config files use standard YAML syntax with flat keys:

# Required parameters
groupby: "condition"
condition1: "control"
condition2: "treatment"

# Common parameters
obsm_key: "X_pca"
layer: "logged_counts"
result_key: "kompot_de"

# Sample variance
sample_col: "sample_id"

# GP parameters (→ GPSettings)
sigma: 1.0
ls_factor: 10.0
batch_size: 100
n_landmarks: 5000

# FDR parameters (→ FDRSettings)
fdr_threshold: 0.05
null_genes: 2000

# Filtering (→ FilterSettings)
genes: ["Gene1", "Gene2", "Gene3"]
cell_filter: {batch: "batch1"}

JSON Format¶

JSON is also supported:

{
  "groupby": "condition",
  "condition1": "control",
  "condition2": "treatment",
  "obsm_key": "X_pca",
  "batch_size": 100,
  "fdr_threshold": 0.05
}

Config Templates¶

Kompot provides ready-to-use templates:

Minimal templates (commonly used parameters only):

kompot/cli/templates/dm_config_minimal.yaml
kompot/cli/templates/de_config_minimal.yaml
kompot/cli/templates/da_config_minimal.yaml

Complete templates (all available parameters with documentation):

kompot/cli/templates/dm_config_template.yaml
kompot/cli/templates/de_config_template.yaml
kompot/cli/templates/da_config_template.yaml

Pipeline Integration¶

Nextflow Example¶

process KOMPOT_DE {
    input:
    path adata
    path config

    output:
    path "results.h5ad"

    script:
    """
    kompot de ${adata} -o results.h5ad -c ${config}
    """
}

Snakemake Example¶

rule kompot_de:
    input:
        adata = "data/{sample}.h5ad",
        config = "configs/de_config.yaml"
    output:
        results = "results/{sample}_de.h5ad"
    shell:
        "kompot de {input.adata} -o {output.results} -c {input.config}"

Shell Script Example¶

#!/bin/bash
# Process multiple samples with complete workflow

for sample in sample1 sample2 sample3; do
    echo "Processing ${sample}..."

    # Step 1: Compute diffusion maps
    kompot dm \\
        data/${sample}.h5ad \\
        -o temp/${sample}_dm.h5ad \\
        --pca-key X_pca \\
        --n-components 10

    # Step 2: Differential expression
    kompot de \\
        temp/${sample}_dm.h5ad \\
        -o results/${sample}_de.h5ad \\
        --groupby condition \\
        --condition1 control \\
        --condition2 treatment \\
        --obsm-key DM_EigenVectors \\
        --batch-size 100

    if [ $? -eq 0 ]; then
        echo "${sample} completed successfully"
        rm temp/${sample}_dm.h5ad  # Cleanup intermediate file
    else
        echo "${sample} failed" >&2
        exit 1
    fi
done

Output Format¶

Differential Expression Output¶

Results stored in:

adata.var["kompot_de_{cond1}_to_{cond2}_mahalanobis"] - Significance scores
adata.var["kompot_de_{cond1}_to_{cond2}_mean_lfc"] - Mean log fold change
adata.var["kompot_de_{cond1}_to_{cond2}_is_de"] - Boolean significance flag
adata.var["kompot_de_{cond1}_to_{cond2}_mahalanobis_local_fdr"] - Local FDR
adata.uns["kompot_de"] - Run metadata and parameters

Differential Abundance Output¶

Results stored in:

adata.obs["kompot_da_{cond1}_to_{cond2}_lfc"] - Log fold change per cell
adata.obs["kompot_da_{cond1}_to_{cond2}_lfc_zscore"] - Z-scores
adata.obs["kompot_da_{cond1}_to_{cond2}_neg_log10_lfc_ptp"] - -log10 p-values
adata.obs["kompot_da_{cond1}_to_{cond2}_lfc_direction"] - Direction (up/down/neutral)
adata.uns["kompot_da"] - Run metadata and parameters

Logging and Verbosity¶

Control logging output:

# Standard logging (INFO level)
kompot de input.h5ad -o output.h5ad --groupby condition ...

# Verbose logging (DEBUG level)
kompot -v de input.h5ad -o output.h5ad --groupby condition ...

# Redirect logs
kompot de input.h5ad -o output.h5ad ... 2> analysis.log

Error Handling¶

The CLI exits with different codes:

0 - Success
1 - General error (missing files, invalid parameters, analysis failure)
130 - Interrupted by user (Ctrl+C)

Check exit codes in scripts:

kompot de input.h5ad -o output.h5ad ...
if [ $? -ne 0 ]; then
    echo "Analysis failed" >&2
    exit 1
fi

Performance Tips¶

Memory Management¶

For large datasets:

# Reduce batch size
kompot de input.h5ad -o output.h5ad ... --batch-size 50

# Use fewer landmarks
kompot de input.h5ad -o output.h5ad ... --n-landmarks 3000

# Enable disk storage (requires config file)
# In config.yaml:
#   store_arrays_on_disk: true
#   disk_storage_dir: "/tmp/kompot_cache"

Speed Optimization¶

# Reduce null genes for faster FDR estimation
kompot de input.h5ad -o output.h5ad ... --null-genes 1000

# Use fewer landmarks
kompot da input.h5ad -o output.h5ad ... --n-landmarks 2000

# Disable progress bars in scripts
kompot de input.h5ad -o output.h5ad ... --no-progress

Troubleshooting¶

Common Issues¶

Missing required parameters:

Error: Missing required parameters: groupby, condition1, condition2

Solution: Provide via CLI args or config file

File not found:

Error: AnnData file not found: input.h5ad

Solution: Check file path and ensure it exists

Invalid condition label:

Error: Condition 'X' not found in column 'condition'

Solution: Check condition labels in your data

Memory error:

MemoryError or JAX out of memory

Solution: Reduce --batch-size and --n-landmarks

Getting Help¶

# General help
kompot --help

# Command-specific help
kompot de --help
kompot da --help
kompot dm --help

# Check version
kompot --version

Comparison with Python API¶

Feature	CLI	Python API
Basic analysis	✅ Simple	✅ Simple
Advanced parameters	⚠️ Requires config file	✅ Direct access
Pipeline integration	✅ Easy	⚠️ Requires scripting
Interactive exploration	❌ Not suitable	✅ Excellent
Visualization	❌ Requires separate step	✅ Integrated
Debugging	⚠️ Limited	✅ Full access
Documentation	✅ Built-in help	✅ Comprehensive

Recommendation:

Use CLI for: automated pipelines, batch processing, workflow integration
Use Python API for: interactive analysis, visualization, parameter exploration, custom workflows

Command-Line Interface (CLI)¶

Installation¶

Overview¶

Quick Start¶

Complete Workflow¶

Diffusion Maps (Preprocessing)¶

Differential Expression (Basic)¶

Differential Abundance (Basic)¶

Using Config Files¶

Diffusion Maps Command¶

Basic Usage¶

Prerequisites¶

Common Options¶

Output¶

Example: Complete Preprocessing¶

Why Diffusion Maps?¶

Differential Expression Command¶

Basic Usage¶

Required Parameters¶

Output Options¶

Common Options¶

Boolean Flags¶

Compute Options¶

Advanced Options¶

Example: Complete Analysis¶

Differential Abundance Command¶

Basic Usage¶

Required Parameters¶

Output Options¶

Common Options¶

Boolean Flags¶

Compute Options¶

Example: Complete Analysis¶

Configuration Files¶

Mapping to the Python API¶

YAML Format¶

JSON Format¶

Config Templates¶

Pipeline Integration¶

Nextflow Example¶

Snakemake Example¶

Shell Script Example¶

Output Format¶

Differential Expression Output¶

Differential Abundance Output¶

Logging and Verbosity¶

Error Handling¶

Performance Tips¶

Memory Management¶

Speed Optimization¶

Troubleshooting¶

Common Issues¶

Getting Help¶

Comparison with Python API¶

See Also¶