Command-Line Interface (CLI) ============================ The kompot CLI provides command-line access to differential expression (DE) and differential abundance (DA) analysis for pipeline integration and workflow automation. Installation ------------ The CLI is installed automatically with kompot: .. code-block:: bash pip install kompot # or mamba install -c bioconda kompot Verify installation: .. code-block:: bash kompot --version kompot --help Overview -------- The CLI provides three main commands: - ``kompot dm`` - Compute diffusion maps (preprocessing with Palantir) - ``kompot de`` - Differential expression analysis - ``kompot da`` - Differential abundance analysis All commands support: - Direct CLI arguments for common parameters - YAML/JSON config files for advanced parameters - Reading/writing ``.h5ad`` and ``.zarr`` AnnData formats Quick Start ----------- Complete Workflow ^^^^^^^^^^^^^^^^^ .. code-block:: bash # 1. Compute diffusion maps (preprocessing) kompot dm input.h5ad -o input_with_dm.h5ad \\ --pca-key X_pca \\ --n-components 10 # 2. Run differential expression kompot de input_with_dm.h5ad -o de_results.h5ad \\ --groupby condition \\ --condition1 control \\ --condition2 treatment \\ --obsm-key DM_EigenVectors # 3. Run differential abundance kompot da input_with_dm.h5ad -o da_results.h5ad \\ --groupby condition \\ --condition1 control \\ --condition2 treatment \\ --obsm-key DM_EigenVectors Diffusion Maps (Preprocessing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash kompot dm input.h5ad -o output.h5ad \\ --pca-key X_pca \\ --n-components 10 \\ --knn 30 Differential Expression (Basic) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash kompot de input.h5ad -o output.h5ad \\ --groupby condition \\ --condition1 control \\ --condition2 treatment \\ --obsm-key X_pca \\ --layer logged_counts Differential Abundance (Basic) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash kompot da input.h5ad -o output.h5ad \\ --groupby condition \\ --condition1 control \\ --condition2 treatment \\ --obsm-key X_pca Using Config Files ^^^^^^^^^^^^^^^^^^ For complex analyses with many parameters, use config files: .. code-block:: bash # Get template (copy from installed package) python -c "from pathlib import Path; import shutil; \\ import kompot; \\ src = Path(kompot.__file__).parent / 'cli' / 'templates' / 'de_config_minimal.yaml'; \\ shutil.copy(src, 'my_de_config.yaml')" # Edit config file nano my_de_config.yaml # Run analysis kompot de input.h5ad -o output.h5ad -c my_de_config.yaml CLI arguments override config file values: .. code-block:: bash kompot de input.h5ad -o output.h5ad \\ -c my_config.yaml \\ --batch-size 50 # Overrides batch_size in config Diffusion Maps Command ---------------------- The ``dm`` command computes diffusion maps using Palantir, which provides a continuous representation of cell states needed for differential analysis. Basic Usage ^^^^^^^^^^^ .. code-block:: bash kompot dm INPUT -o OUTPUT [OPTIONS] Prerequisites ^^^^^^^^^^^^^ - Requires Palantir: ``pip install palantir`` or ``pip install kompot[recommended]`` - Input AnnData must contain PCA coordinates in ``adata.obsm`` Common Options ^^^^^^^^^^^^^^ .. code-block:: text --pca-key KEY # PCA coordinates in adata.obsm (default: X_pca) --n-components N # Number of diffusion components (default: 10) --knn N # Number of nearest neighbors (default: 30) --alpha FLOAT # Diffusion alpha parameter (default: 0) Output ^^^^^^ Results are stored in: - ``adata.obsm['DM_EigenVectors']`` - Diffusion map coordinates (n_cells × n_components) - ``adata.uns['DM_EigenValues']`` - Eigenvalues of diffusion operator Example: Complete Preprocessing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash # Starting with raw AnnData (assuming PCA already computed) kompot dm bone_marrow.h5ad -o bone_marrow_dm.h5ad \\ --pca-key X_pca \\ --n-components 10 \\ --knn 30 # Then run differential analysis kompot de bone_marrow_dm.h5ad -o results.h5ad \\ --groupby Age \\ --condition1 Young \\ --condition2 Old \\ --obsm-key DM_EigenVectors Why Diffusion Maps? ^^^^^^^^^^^^^^^^^^^^ Diffusion maps capture continuous cell state transitions better than PCA alone: - Preserves the geometry of differentiation trajectories - Reduces noise while maintaining biological structure - Euclidean distance in this representation better represents biological similarity - Distance in cell-state representation is used by kompot's covariance kernel See the `Palantir documentation `_ for details. Differential Expression Command -------------------------------- Basic Usage ^^^^^^^^^^^ .. code-block:: bash kompot de INPUT -o OUTPUT [OPTIONS] kompot de INPUT -t TABLE_OUTPUT [OPTIONS] At least one output must be specified: ``-o/--output`` for full AnnData or ``-t/--table-output`` for CSV/TSV table. Required Parameters ^^^^^^^^^^^^^^^^^^^ Either via CLI or config file: - ``--groupby COLUMN`` - Column in ``adata.obs`` with condition labels - ``--condition1 LABEL`` - Reference condition label - ``--condition2 LABEL`` - Comparison condition label Output Options ^^^^^^^^^^^^^^ .. code-block:: text -o, --output FILE # Output AnnData file (.h5ad or .zarr) -t, --table-output FILE # Output DE results as table (.csv or .tsv) The ``--table-output`` option exports only the kompot-produced columns from ``adata.var`` (gene-level statistics like mahalanobis distance, log fold change, FDR, etc.). This is useful for downstream analysis or integration with other tools. Common Options ^^^^^^^^^^^^^^ .. code-block:: text --obsm-key KEY # Cell state representation (default: DM_EigenVectors) --layer LAYER # Expression data layer (default: None, use X) --result-key KEY # Storage key (default: kompot_de) --n-landmarks N # Number of landmarks (default: 5000) --sample-col COLUMN # Sample ID column for replicates --batch-size N # Cells per batch (default: 100) --fdr-threshold FLOAT # FDR threshold (default: 0.05) --null-genes N # Null genes for FDR (default: 2000) Boolean Flags ^^^^^^^^^^^^^ .. code-block:: text --no-progress # Disable progress bars --store-landmarks # Store landmarks for reuse --store-additional-stats # Store extra statistics --overwrite # Overwrite without warning Compute Options ^^^^^^^^^^^^^^^ .. code-block:: text --use-gpu # Use GPU acceleration (requires CUDA-enabled JAX) --threads N # Number of threads for JAX/NumPy/Dask (default: all cores) Advanced Options ^^^^^^^^^^^^^^^^ For advanced parameters (gene filtering, cell filtering, GP kernel parameters, memory management, etc.), see the configuration file templates: - ``kompot/cli/templates/de_config_template.yaml`` - Complete template with all parameters - ``kompot/cli/templates/de_config_minimal.yaml`` - Minimal template with common parameters Example: Complete Analysis ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash kompot de bone_marrow.h5ad -o results.h5ad \\ --groupby Age \\ --condition1 Young \\ --condition2 Old \\ --obsm-key DM_EigenVectors \\ --layer logged_counts \\ --sample-col Sample \\ --n-landmarks 5000 \\ --batch-size 100 \\ --fdr-threshold 0.05 \\ --null-genes 2000 \\ --store-additional-stats Differential Abundance Command ------------------------------- Basic Usage ^^^^^^^^^^^ .. code-block:: bash kompot da INPUT -o OUTPUT [OPTIONS] kompot da INPUT -t TABLE_OUTPUT [OPTIONS] At least one output must be specified: ``-o/--output`` for full AnnData or ``-t/--table-output`` for CSV/TSV table. Required Parameters ^^^^^^^^^^^^^^^^^^^ Either via CLI or config file: - ``--groupby COLUMN`` - Column in ``adata.obs`` with condition labels - ``--condition1 LABEL`` - Reference condition label - ``--condition2 LABEL`` - Comparison condition label Output Options ^^^^^^^^^^^^^^ .. code-block:: text -o, --output FILE # Output AnnData file (.h5ad or .zarr) -t, --table-output FILE # Output DA results as table (.csv or .tsv) The ``--table-output`` option exports only the kompot-produced columns from ``adata.obs`` (cell-level statistics like log fold change, z-scores, PTP values, etc.). This is useful for downstream analysis or integration with other tools. Common Options ^^^^^^^^^^^^^^ .. code-block:: text --obsm-key KEY # Cell state representation (default: X_pca) --result-key KEY # Storage key (default: kompot_da) --n-landmarks N # Number of landmarks (default: None, all points) --sample-col COLUMN # Sample ID column for replicates --batch-size N # Cells per batch (default: None) --log-fold-change-threshold FLOAT # LFC threshold (default: 1.0) --ptp-threshold FLOAT # PTP threshold (default: 0.05) --ls-factor FLOAT # Length scale factor (default: 10.0) Boolean Flags ^^^^^^^^^^^^^ .. code-block:: text --store-landmarks # Store landmarks for reuse --overwrite # Overwrite without warning Compute Options ^^^^^^^^^^^^^^^ .. code-block:: text --use-gpu # Use GPU acceleration (requires CUDA-enabled JAX) --threads N # Number of threads for JAX/NumPy/Dask (default: all cores) Example: Complete Analysis ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash kompot da bone_marrow.h5ad -o results.h5ad \\ --groupby Age \\ --condition1 Young \\ --condition2 Old \\ --obsm-key DM_EigenVectors \\ --sample-col Sample \\ --n-landmarks 3000 \\ --log-fold-change-threshold 1.0 \\ --ptp-threshold 0.05 Configuration Files ------------------- YAML Format ^^^^^^^^^^^ Config files use standard YAML syntax: .. code-block:: yaml # Required parameters groupby: "condition" condition1: "control" condition2: "treatment" # Common parameters obsm_key: "X_pca" layer: "logged_counts" result_key: "kompot_de" # Sample variance sample_col: "sample_id" # Performance batch_size: 100 n_landmarks: 5000 # Significance fdr_threshold: 0.05 null_genes: 2000 # Advanced parameters genes: ["Gene1", "Gene2", "Gene3"] # Analyze specific genes cell_filter: {batch: "batch1"} # Exclude batch1 cells # GP parameters sigma: 1.0 ls_factor: 10.0 JSON Format ^^^^^^^^^^^ JSON is also supported: .. code-block:: json { "groupby": "condition", "condition1": "control", "condition2": "treatment", "obsm_key": "X_pca", "batch_size": 100, "fdr_threshold": 0.05 } Config Templates ^^^^^^^^^^^^^^^^ Kompot provides ready-to-use templates: **Minimal templates** (commonly used parameters only): - ``kompot/cli/templates/dm_config_minimal.yaml`` - ``kompot/cli/templates/de_config_minimal.yaml`` - ``kompot/cli/templates/da_config_minimal.yaml`` **Complete templates** (all available parameters with documentation): - ``kompot/cli/templates/dm_config_template.yaml`` - ``kompot/cli/templates/de_config_template.yaml`` - ``kompot/cli/templates/da_config_template.yaml`` Pipeline Integration -------------------- Nextflow Example ^^^^^^^^^^^^^^^^ .. code-block:: groovy process KOMPOT_DE { input: path adata path config output: path "results.h5ad" script: """ kompot de ${adata} -o results.h5ad -c ${config} """ } Snakemake Example ^^^^^^^^^^^^^^^^^ .. code-block:: python rule kompot_de: input: adata = "data/{sample}.h5ad", config = "configs/de_config.yaml" output: results = "results/{sample}_de.h5ad" shell: "kompot de {input.adata} -o {output.results} -c {input.config}" Shell Script Example ^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash #!/bin/bash # Process multiple samples with complete workflow for sample in sample1 sample2 sample3; do echo "Processing ${sample}..." # Step 1: Compute diffusion maps kompot dm \\ data/${sample}.h5ad \\ -o temp/${sample}_dm.h5ad \\ --pca-key X_pca \\ --n-components 10 # Step 2: Differential expression kompot de \\ temp/${sample}_dm.h5ad \\ -o results/${sample}_de.h5ad \\ --groupby condition \\ --condition1 control \\ --condition2 treatment \\ --obsm-key DM_EigenVectors \\ --batch-size 100 if [ $? -eq 0 ]; then echo "${sample} completed successfully" rm temp/${sample}_dm.h5ad # Cleanup intermediate file else echo "${sample} failed" >&2 exit 1 fi done Output Format ------------- Differential Expression Output ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Results stored in: - ``adata.var["kompot_de_{cond1}_to_{cond2}_mahalanobis"]`` - Significance scores - ``adata.var["kompot_de_{cond1}_to_{cond2}_mean_lfc"]`` - Mean log fold change - ``adata.var["kompot_de_{cond1}_to_{cond2}_is_de"]`` - Boolean significance flag - ``adata.var["kompot_de_{cond1}_to_{cond2}_mahalanobis_local_fdr"]`` - Local FDR - ``adata.uns["kompot_de"]`` - Run metadata and parameters Differential Abundance Output ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Results stored in: - ``adata.obs["kompot_da_{cond1}_to_{cond2}_lfc"]`` - Log fold change per cell - ``adata.obs["kompot_da_{cond1}_to_{cond2}_lfc_zscore"]`` - Z-scores - ``adata.obs["kompot_da_{cond1}_to_{cond2}_neg_log10_lfc_ptp"]`` - -log10 p-values - ``adata.obs["kompot_da_{cond1}_to_{cond2}_lfc_direction"]`` - Direction (up/down/neutral) - ``adata.uns["kompot_da"]`` - Run metadata and parameters Logging and Verbosity --------------------- Control logging output: .. code-block:: bash # Standard logging (INFO level) kompot de input.h5ad -o output.h5ad --groupby condition ... # Verbose logging (DEBUG level) kompot -v de input.h5ad -o output.h5ad --groupby condition ... # Redirect logs kompot de input.h5ad -o output.h5ad ... 2> analysis.log Error Handling -------------- The CLI exits with different codes: - ``0`` - Success - ``1`` - General error (missing files, invalid parameters, analysis failure) - ``130`` - Interrupted by user (Ctrl+C) Check exit codes in scripts: .. code-block:: bash kompot de input.h5ad -o output.h5ad ... if [ $? -ne 0 ]; then echo "Analysis failed" >&2 exit 1 fi Performance Tips ---------------- Memory Management ^^^^^^^^^^^^^^^^^ For large datasets: .. code-block:: bash # Reduce batch size kompot de input.h5ad -o output.h5ad ... --batch-size 50 # Use fewer landmarks kompot de input.h5ad -o output.h5ad ... --n-landmarks 3000 # Enable disk storage (requires config file) # In config.yaml: # store_arrays_on_disk: true # disk_storage_dir: "/tmp/kompot_cache" Speed Optimization ^^^^^^^^^^^^^^^^^^ .. code-block:: bash # Reduce null genes for faster FDR estimation kompot de input.h5ad -o output.h5ad ... --null-genes 1000 # Use fewer landmarks kompot da input.h5ad -o output.h5ad ... --n-landmarks 2000 # Disable progress bars in scripts kompot de input.h5ad -o output.h5ad ... --no-progress Troubleshooting --------------- Common Issues ^^^^^^^^^^^^^ **Missing required parameters:** .. code-block:: text Error: Missing required parameters: groupby, condition1, condition2 *Solution:* Provide via CLI args or config file **File not found:** .. code-block:: text Error: AnnData file not found: input.h5ad *Solution:* Check file path and ensure it exists **Invalid condition label:** .. code-block:: text Error: Condition 'X' not found in column 'condition' *Solution:* Check condition labels in your data **Memory error:** .. code-block:: text MemoryError or JAX out of memory *Solution:* Reduce ``--batch-size`` and ``--n-landmarks`` Getting Help ^^^^^^^^^^^^ .. code-block:: bash # General help kompot --help # Command-specific help kompot de --help kompot da --help kompot dm --help # Check version kompot --version Comparison with Python API --------------------------- .. list-table:: :header-rows: 1 :widths: 25 15 15 * - Feature - CLI - Python API * - Basic analysis - ✅ Simple - ✅ Simple * - Advanced parameters - ⚠️ Requires config file - ✅ Direct access * - Pipeline integration - ✅ Easy - ⚠️ Requires scripting * - Interactive exploration - ❌ Not suitable - ✅ Excellent * - Visualization - ❌ Requires separate step - ✅ Integrated * - Debugging - ⚠️ Limited - ✅ Full access * - Documentation - ✅ Built-in help - ✅ Comprehensive **Recommendation:** - Use **CLI** for: automated pipelines, batch processing, workflow integration - Use **Python API** for: interactive analysis, visualization, parameter exploration, custom workflows See Also -------- - :doc:`Python API Documentation ` - :doc:`Getting Started Tutorial ` - :doc:`Sample Variance Guide ` - `GitHub Repository `_