AnnData Interface

The AnnData interface provides high-level functions that work directly with AnnData objects, handling data flow, parameter management, and result storage automatically.

See the Getting Started and Advanced Differential Expression tutorials for worked examples.

Differential Expression

import kompot

# Minimal call
kompot.de(adata, "condition", "Young", "Old")

# Customise GP noise and FDR threshold
kompot.de(
    adata, "condition", "Young", "Old",
    gp=kompot.GPSettings(sigma=0.5),
    fdr=kompot.FDRSettings(threshold=0.01),
)

# Filter to specific cell types
kompot.de(
    adata, "condition", "Young", "Old",
    filter=kompot.FilterSettings(
        groups="cell_type",
        cell_filter={"cell_type": ["T_cell", "B_cell"]},
    ),
)

# Sample variance for biological replicates (limit to top genes)
kompot.de(
    adata, "condition", "Young", "Old",
    sample_col="donor_id",
    genes=top_genes,  # e.g. top 200 from a previous run
    fdr=kompot.FDRSettings(null_genes=0),
)
kompot.de(adata, groupby: str, condition1: str, condition2: str, obsm_key: str = 'DM_EigenVectors', layer=None, genes=None, sample_col=None, gp: GPSettings | None = None, fdr: FDRSettings | None = None, filter: FilterSettings | None = None, storage: StorageSettings | None = None, output: OutputSettings | None = None, model: ModelSettings | None = None, dry_run: bool = False, **function_kwargs) Dict[str, ndarray] | AnyView on GitHub

Run differential expression analysis on an AnnData object.

The most common call is just:

kompot.de(adata, "condition", "Young", "Old")

Advanced options are available through the settings dataclasses (GPSettings, FDRSettings, FilterSettings, StorageSettings, OutputSettings). Any field left at its default is equivalent to omitting it entirely. Extra **function_kwargs are forwarded to mellon’s FunctionEstimator.

Parameters:
  • adata (AnnData) – AnnData object containing cells from both conditions.

  • groupby (str) – Column in adata.obs with condition labels.

  • condition1 (str) – Labels identifying the two conditions.

  • condition2 (str) – Labels identifying the two conditions.

  • obsm_key (str) – Key in adata.obsm for cell-state coordinates.

  • layer (str, optional) – Layer with expression data (Noneadata.X).

  • genes (list of str, optional) – Subset of genes to analyse.

  • sample_col (str, optional) – Column with biological-replicate labels.

  • gp (GPSettings, optional) – GP model parameters (sigma, ls, n_landmarks, etc.).

  • fdr (FDRSettings, optional) – FDR / null-distribution parameters.

  • filter (FilterSettings, optional) – Cell filtering and group-subsetting.

  • storage (StorageSettings, optional) – Where and how results are stored.

  • output (OutputSettings, optional) – Return-value and progress-bar control.

  • model (ModelSettings, optional) – Pre-fitted models or predictors to inject. When provided, fitting is skipped for the corresponding components. See ModelSettings.

  • dry_run (bool, optional) – If True, estimate resource requirements and print a report instead of running the analysis. Returns a ResourcePlan.

  • **function_kwargs – Forwarded to FunctionEstimator.

Returns:

Return value depends on copy and return_full_results in OutputSettings.

Return type:

Union[Dict[str, np.ndarray], AnnData, Tuple[Dict[str, np.ndarray], AnnData]]

Differential Abundance

# Minimal call
kompot.da(adata, "condition", "Young", "Old")

# Adjust significance thresholds
kompot.da(
    adata, "condition", "Young", "Old",
    threshold=kompot.DAThresholdSettings(ptp_threshold=0.01),
)
kompot.da(adata, groupby: str, condition1: str, condition2: str, obsm_key: str = 'DM_EigenVectors', sample_col=None, gp: GPSettings | None = None, threshold: DAThresholdSettings | None = None, storage: StorageSettings | None = None, output: OutputSettings | None = None, model: ModelSettings | None = None, **density_kwargs) Dict[str, ndarray] | AnyView on GitHub

Run differential abundance analysis on an AnnData object.

The most common call is just:

kompot.da(adata, "condition", "Young", "Old")

Advanced options are available through the settings dataclasses (GPSettings, DAThresholdSettings, StorageSettings, OutputSettings). Any field left at its default is equivalent to omitting it entirely. Extra **density_kwargs are forwarded to mellon’s DensityEstimator.

Parameters:
  • adata (AnnData) – AnnData object containing cells from both conditions.

  • groupby (str) – Column in adata.obs with condition labels.

  • condition1 (str) – Labels identifying the two conditions.

  • condition2 (str) – Labels identifying the two conditions.

  • obsm_key (str) – Key in adata.obsm for cell-state coordinates.

  • sample_col (str, optional) – Column with biological-replicate labels.

  • gp (GPSettings, optional) – GP model parameters (ls_factor, n_landmarks, landmarks, batch_size, jit_compile, random_state).

  • threshold (DAThresholdSettings, optional) – Significance thresholds for abundance changes.

  • storage (StorageSettings, optional) – Where and how results are stored.

  • output (OutputSettings, optional) – Return-value control.

  • model (ModelSettings, optional) – Pre-fitted models or predictors to inject. For DA, only density_predictor1/2 and variance_predictor1/2 are used. See ModelSettings.

  • **density_kwargs – Forwarded to DensityEstimator.

Returns:

Return value depends on copy and return_full_results in OutputSettings.

Return type:

Union[Dict[str, np.ndarray], AnnData, Tuple[Dict[str, np.ndarray], AnnData]]

Expression Smoothing

kompot.smooth_expression(adata, groupby: str | None = None, condition: str | None = None, obsm_key: str = 'DM_EigenVectors', layer: str | None = None, genes: List[str] | None = None, sample_col: str | None = None, gp: GPSettings | None = None, storage: StorageSettings | None = None, output: OutputSettings | None = None, model: ModelSettings | None = None, **function_kwargs) Dict[str, Any] | Any | NoneView on GitHub

Smooth gene expression for a single condition using GP regression.

Fits an ExpressionModel on the selected cells and evaluates it on all cells in adata. This means every cell gets a smoothed value and uncertainty estimate, even if it was not part of the training condition. Stores the smoothed values, posterior standard deviations, and (optionally) empirical and sample variance layers in adata.

The most common call is just:

kompot.smooth_expression(adata, "condition", "Young")

Advanced options are available through the settings dataclasses (GPSettings, StorageSettings, OutputSettings). Any field left at its default is equivalent to omitting it entirely. Extra **function_kwargs are forwarded to mellon’s FunctionEstimator.

Parameters:
  • adata (AnnData) – AnnData object.

  • groupby (str, optional) – Column in adata.obs identifying conditions. Required when condition is specified.

  • condition (str, optional) – Which group in groupby to smooth. If None and groupby is None, all cells are used.

  • obsm_key (str) – Key in adata.obsm for cell-state coordinates.

  • layer (str, optional) – Layer to use as expression input. None means adata.X.

  • genes (list of str, optional) – Subset of genes to smooth. None means all genes.

  • sample_col (str, optional) – Column in adata.obs with biological-replicate labels.

  • gp (GPSettings, optional) – GP model parameters (sigma, ls, n_landmarks, etc.).

  • storage (StorageSettings, optional) – Output storage parameters (result_key, overwrite).

  • output (OutputSettings, optional) – Return behavior (copy, inplace, return_full_results, progress).

  • model (ModelSettings, optional) – Pre-fitted ExpressionModel to inject via model1. When provided, skips internal fitting.

  • **function_kwargs – Forwarded to mellon.FunctionEstimator.

Returns:

None when results are stored in-place. If return_full_results is True, a dictionary with keys "model", "table", and "field_names".

Return type:

None or dict

Settings

Each settings dataclass groups related parameters. Any field left at its default is equivalent to omitting it — you only override what you need.

GPSettings

Controls the Gaussian Process model.

class kompot.GPSettings(sigma: float = 1.0, ls: float | None = None, ls_factor: float = 10.0, n_landmarks: int | None = 5000, landmarks: ndarray | None = None, use_empirical_variance: bool = False, batch_size: int | None = 100, eps: float = 1e-08, jit_compile: bool = False, random_state: int | None = None)View on GitHub

Gaussian-process parameters for the expression model.

Parameters:
  • sigma (float) – Noise level for the GP.

  • ls (float, optional) – Length scale. If None, estimated automatically using ls_factor.

  • ls_factor (float) – Multiplier applied to the automatically inferred length scale.

  • n_landmarks (int, optional) – Number of landmarks for the Nystrom approximation.

  • landmarks (np.ndarray, optional) – Pre-computed landmark coordinates.

  • use_empirical_variance (bool) – Estimate per-gene heteroscedastic noise from GP residuals.

  • batch_size (int, optional) – Number of cells processed at once during prediction.

  • eps (float) – Small constant for numerical stability.

  • jit_compile (bool) – Use JAX JIT compilation.

  • random_state (int, optional) – Random seed for landmark selection.

batch_size: int | None = 100
eps: float = 1e-08
jit_compile: bool = False
landmarks: ndarray | None = None
ls: float | None = None
ls_factor: float = 10.0
n_landmarks: int | None = 5000
random_state: int | None = None
sigma: float = 1.0
use_empirical_variance: bool = False

FDRSettings

Controls false-discovery-rate estimation (DE only).

class kompot.FDRSettings(null_genes: int | List[int] | str | None = 'auto', null_seed: int | None = 42, threshold: float = 0.05, null_mahalanobis: ndarray | None = None, null_expression: Tuple[ndarray, ndarray] | None = None, combine_with_internal: bool = False)View on GitHub

False-discovery-rate and null-distribution parameters.

Parameters:
  • null_genes (int, List[int], None, or "auto") –

    Controls null-distribution calibration for FDR.

    • "auto" (default) — generates 2 000 null genes by column shuffling when sample_col is not set, 0 otherwise.

    • int — number of null genes to auto-generate. Not compatible with pre-fitted predictors in ModelSettings (raises ValueError).

    • List[int] — explicit column indices of pre-baked null features already present in the data. Use this when injecting pre-fitted predictors via ModelSettings, since the predictors were trained on a fixed set of features and cannot cover newly generated columns. The null features are used to calibrate the FDR null distribution and are then stripped from all output (table and layers contain only the real genes).

    • 0 or None — disable FDR estimation.

  • null_seed (int, optional) – Random seed for null-gene sampling.

  • threshold (float) – FDR threshold for the is_de boolean column.

  • null_mahalanobis (np.ndarray, optional) – Pre-computed null Mahalanobis distances. When provided, these are used directly as the null distribution for FDR estimation, bypassing internal null gene generation. Mutually exclusive with null_expression.

  • null_expression (tuple of (np.ndarray, np.ndarray), optional) – External null expression data as (expr1, expr2). These columns are appended to the expression matrices and fitted through the same GP model, then their Mahalanobis distances form the null distribution. Mutually exclusive with null_mahalanobis.

  • combine_with_internal (bool) – If True, concatenate external null distances with internally generated null distances. If False (default), the external null replaces the internal one entirely.

combine_with_internal: bool = False
null_expression: Tuple[ndarray, ndarray] | None = None
null_genes: int | List[int] | str | None = 'auto'
null_mahalanobis: ndarray | None = None
null_seed: int | None = 42
threshold: float = 0.05

DAThresholdSettings

Significance thresholds for differential abundance.

class kompot.DAThresholdSettings(lfc_threshold: float = 1.0, ptp_threshold: float = 0.05)View on GitHub

Significance thresholds for differential abundance.

Parameters:
  • lfc_threshold (float) – Log fold change threshold for significance classification.

  • ptp_threshold (float) – Posterior tail probability threshold for significance.

lfc_threshold: float = 1.0
ptp_threshold: float = 0.05

FilterSettings

Controls cell filtering and group subsetting (DE only).

class kompot.FilterSettings(cell_filter: str | List[str] | Dict[str, Any] | List[Dict[str, Any]] | None = None, groups: str | Dict[str, Any] | List[Dict[str, Any]] | Series | ndarray | List[ndarray] | None = None, min_cells: int = 2, min_percentage: float | None = None, check_representation: bool | None = None)View on GitHub

Cell-filtering and group-subsetting parameters.

Parameters:
  • cell_filter (optional) – Specification for cells to include (boolean column name, dict of column->values, etc.).

  • groups (optional) – Column or specification for per-group analyses.

  • min_cells (int) – Minimum cells per condition within a group.

  • min_percentage (float, optional) – Minimum percentage of cells per condition within a group.

  • check_representation (bool, optional) – None warns, True auto-filters, False skips.

cell_filter: str | List[str] | Dict[str, Any] | List[Dict[str, Any]] | None = None
check_representation: bool | None = None
groups: str | Dict[str, Any] | List[Dict[str, Any]] | Series | ndarray | List[ndarray] | None = None
min_cells: int = 2
min_percentage: float | None = None

StorageSettings

Controls where and how results are stored in the AnnData object.

class kompot.StorageSettings(result_key: str | None = None, overwrite: bool | None = None, store_landmarks: bool = False, store_posterior_covariance: bool = False, store_additional_stats: bool = False, store_arrays_on_disk: bool | None = None, disk_storage_dir: str | None = None, max_memory_ratio: float = 0.8)View on GitHub

Output-storage and memory-management parameters.

Parameters:
  • result_key (str) – Key prefix used in adata.var, adata.layers, adata.uns. Defaults to "kompot_de" for DE and "kompot_da" for DA.

  • overwrite (bool, optional) – None (default) warns, True silently overwrites, False raises on conflict.

  • store_landmarks (bool) – Persist landmarks in adata.uns for future reuse.

  • store_posterior_covariance (bool) – Store the (n_cells x n_cells) posterior covariance in adata.obsp.

  • store_additional_stats (bool) – Store extra columns (p-values, tail FDR, PTP, z-scores).

  • store_arrays_on_disk (bool, optional) – Use disk-backed arrays for large intermediate matrices.

  • disk_storage_dir (str, optional) – Directory for disk-backed arrays.

  • max_memory_ratio (float) – Fraction of RAM before triggering disk storage.

disk_storage_dir: str | None = None
max_memory_ratio: float = 0.8
overwrite: bool | None = None
result_key: str | None = None
store_additional_stats: bool = False
store_arrays_on_disk: bool | None = None
store_landmarks: bool = False
store_posterior_covariance: bool = False

OutputSettings

Controls return values and runtime behaviour.

class kompot.OutputSettings(copy: bool = False, inplace: bool = True, return_full_results: bool = False, return_null_data: bool = False, compute_mahalanobis: bool = True, allow_single_condition_variance: bool = False, progress: bool = True)View on GitHub

Control what de() / da() returns and how it behaves.

Parameters:
  • copy (bool) – Return a copy of the AnnData instead of modifying in place.

  • inplace (bool) – Write results into the AnnData object.

  • return_full_results (bool) – Return the full results dict (model, table, landmarks, …). When True, result_dict["null"] includes the full null gene expression matrices, fold changes, and imputations alongside the lightweight metadata.

  • return_null_data (bool) – Return the results dict with lightweight null-distribution metadata (gene indices, names, seed, Mahalanobis distances) without the full expression matrices. When return_full_results is also True, the null data additionally includes the full expression matrices.

  • compute_mahalanobis (bool) – Compute per-gene Mahalanobis distances (DE only).

  • allow_single_condition_variance (bool) – Allow sample-variance estimation when only one condition has multiple samples.

  • progress (bool) – Show progress bars.

allow_single_condition_variance: bool = False
compute_mahalanobis: bool = True
copy: bool = False
inplace: bool = True
progress: bool = True
return_full_results: bool = False
return_null_data: bool = False

ModelSettings

Inject pre-fitted models or predictors to skip internal fitting.

class kompot.ModelSettings(model1: Any | None = None, model2: Any | None = None, function_predictor1: Any | None = None, function_predictor2: Any | None = None, obs_variance_predictor1: Any | None = None, obs_variance_predictor2: Any | None = None, variance_predictor1: Any | None = None, variance_predictor2: Any | None = None, density_predictor1: Any | None = None, density_predictor2: Any | None = None)View on GitHub

Pre-fitted models or predictors to inject into de() or da().

When provided, these skip internal fitting for the corresponding component. model1/model2 take precedence over individual predictors.

When using pre-fitted predictors with FDR, null features must be included in the data before fitting (the predictors cannot cover columns that are added later). Pass their column indices via FDRSettings(null_genes=[...]). Passing null_genes=int with pre-fitted predictors raises ValueError.

Parameters:
  • model1 (ExpressionModel, optional) – Full pre-fitted ExpressionModel for each condition (DE only). Takes precedence over individual predictors.

  • model2 (ExpressionModel, optional) – Full pre-fitted ExpressionModel for each condition (DE only). Takes precedence over individual predictors.

  • function_predictor1 (callable, optional) – Pre-fitted mellon Predictor for each condition (DE only).

  • function_predictor2 (callable, optional) – Pre-fitted mellon Predictor for each condition (DE only).

  • obs_variance_predictor1 (callable, optional) – Pre-fitted empirical variance predictor for each condition (DE only).

  • obs_variance_predictor2 (callable, optional) – Pre-fitted empirical variance predictor for each condition (DE only).

  • variance_predictor1 (callable, optional) – Pre-fitted sample variance predictor for each condition. Signature: (X, diag=True/False) -> array.

  • variance_predictor2 (callable, optional) – Pre-fitted sample variance predictor for each condition. Signature: (X, diag=True/False) -> array.

  • density_predictor1 (callable, optional) – Pre-fitted density predictor for each condition (DA only).

  • density_predictor2 (callable, optional) – Pre-fitted density predictor for each condition (DA only).

density_predictor1: Any | None = None
density_predictor2: Any | None = None
function_predictor1: Any | None = None
function_predictor2: Any | None = None
model1: Any | None = None
model2: Any | None = None
obs_variance_predictor1: Any | None = None
obs_variance_predictor2: Any | None = None
variance_predictor1: Any | None = None
variance_predictor2: Any | None = None

When using pre-fitted predictors with FDR estimation, null features must be included in the data before fitting the predictors — they need to go through the same GP smoothing pipeline as the real features. Pass their column indices via FDRSettings(null_genes=[...]). The null features calibrate the FDR null distribution and are then stripped from all output: the result table and adata layers contain only the real genes.

# Assume predictors were trained on 100 real + 200 null features
null_indices = list(range(100, 300))

kompot.de(
    adata, "condition", "WT", "KO",
    model=kompot.ModelSettings(
        function_predictor1=predictor_wt,
        function_predictor2=predictor_ko,
    ),
    fdr=kompot.FDRSettings(null_genes=null_indices),
)

Resource Estimation

Before running resource-intensive differential expression analyses, pass dry_run=True to estimate memory and disk requirements without running the actual computation.

plan = kompot.de(
    adata,
    groupby="age",
    condition1="Young",
    condition2="Old",
    sample_col="donor_id",
    dry_run=True,
)

Run Tracking

class kompot.anndata.utils.RunInfo(adata, run_id: int | None = None, analysis_type: str | None = None)View on GitHub

Bases: object

Class for accessing run information for differential analysis or smoothing.

Provides access to run history, parameters, and result fields.

adata

AnnData object containing the run history

Type:

AnnData

run_id

Requested run ID (may be negative for relative indexing)

Type:

int

adjusted_run_id

Actual run ID after adjusting for negative indexing

Type:

int

analysis_type

Type of analysis: ‘de’, ‘da’, or ‘smooth’

Type:

str

storage_key

Key for accessing the analysis data in adata.uns

Type:

str

run_info

Dictionary with all information about the run

Type:

dict

field_names

Dictionary with field names used in this run

Type:

dict

params

The parameters used for this analysis

Type:

dict

environment

Information about the environment where the analysis was run

Type:

dict

overwritten_fields

List of fields that were overwritten by newer runs

Type:

list

missing_fields

List of fields that are missing/deleted from the AnnData object

Type:

list

__init__(adata, run_id: int | None = None, analysis_type: str | None = None)View on GitHub

Initialize a RunInfo object.

Parameters:
  • adata (AnnData) – AnnData object containing run history

  • run_id (int, optional) – Run ID to retrieve. Negative indices count from the end. If None, uses the most recent run (-1).

  • analysis_type (str, optional) – Type of analysis: ‘de’, ‘da’, or ‘smooth’. If None, attempts to detect from adata.uns.

call_args() Dict[str, Any]View on GitHub

Build kwargs that reproduce this run via da() / de().

The returned dict contains top-level arguments (groupby, condition1, …) and Settings objects (gp, fdr, …). All values are mutable — edit them before passing to de() or da():

kwargs = run.call_args()
kwargs["fdr"].threshold = 0.01   # tighten FDR
kompot.de(adata, **kwargs)
Returns:

Ready for kompot.de(adata, **result) or kompot.da(adata, **result).

Return type:

dict

compare_with(other_run_id: int) RunComparisonView on GitHub

Compare this run with another run.

Parameters:

other_run_id (int) – Run ID to compare with

Returns:

Object containing comparison results with nice display methods

Return type:

RunComparison

get_data() Dict[str, Any]View on GitHub

Get all data related to this run.

Returns:

Dictionary with all run data

Return type:

Dict[str, Any]

get_summary() Dict[str, Any]View on GitHub

Get a summary of this run with key information.

Returns:

Dictionary with run summary

Return type:

Dict[str, Any]

to_settings() Dict[str, Any]View on GitHub

Reconstruct Settings dataclass objects from stored parameters.

Returns:

{"gp": GPSettings(…), "fdr": FDRSettings(…), …} — only Settings that were recorded for this run.

Return type:

dict

Examples

>>> run = kompot.RunInfo(adata, run_id=0, analysis_type="de")
>>> settings = run.to_settings()
>>> settings["gp"].sigma
1.0

Reproducing and Editing Runs

Every run stores its parameters as nested Settings objects. Use call_args() to get a kwargs dict that reproduces the run — then edit it before re-running:

run = kompot.RunInfo(adata, run_id=0, analysis_type="de")

# Reproduce exactly
kwargs = run.call_args()
kompot.de(adata, **kwargs)

# Or tweak parameters first
kwargs = run.call_args()
kwargs["fdr"].threshold = 0.01          # tighten FDR
kwargs["condition2"] = "Mid"            # different comparison
kwargs["gp"].n_landmarks = 3000         # fewer landmarks
kompot.de(adata, **kwargs)

You can also inspect the Settings objects directly:

settings = run.to_settings()
print(settings["gp"])       # GPSettings(sigma=1.0, ls_factor=10.0, ...)
print(settings["fdr"])      # FDRSettings(null_genes=2000, threshold=0.05, ...)

Cleanup

kompot.cleanup(adata: AnnData, run_ids: int | List[int] | None = None, analysis_type: str = 'de', keep_layers: bool | List[str] | None = None, keep_var_fields: bool | List[str] | None = True, keep_obs_fields: bool | List[str] | None = True, keep_obsp_fields: bool | List[str] | None = None, keep_varm_fields: bool | List[str] | None = None, inplace: bool = True) AnnData | NoneView on GitHub

Remove large data (layers, obsp, varm) from differential analysis results.

This function helps reduce AnnData object size by removing large arrays like smoothed expression layers, fold change layers, and posterior covariance matrices while retaining the statistical results in var/obs columns.

Parameters:
  • adata (AnnData) – AnnData object with differential analysis results

  • run_ids (int, list of int, or None, optional) – Run ID(s) to clean up. Negative indices count from the end. - If None (default): Cleans up ALL runs - If int: Cleans up single run - If list: Cleans up specified runs

  • analysis_type (str, default 'de') – Type of analysis: ‘de’ for differential expression, ‘da’ for differential abundance, or ‘smooth’ for expression smoothing

  • keep_layers (bool or list of str, optional) –

    • If None (default): Remove all layers from specified run(s)

    • If False: Remove all layers from specified run(s)

    • If True: Keep all layers from specified run(s)

    • If list: Keep only the specified layer types

  • keep_var_fields (bool or list of str, optional) –

    • If True (default): Keep all var fields from specified run(s)

    • If False: Remove all var fields from specified run(s)

    • If list: Keep only the specified var field types

  • keep_obs_fields (bool or list of str, optional) –

    • If True (default): Keep all obs fields from specified run(s)

    • If False: Remove all obs fields from specified run(s)

    • If list: Keep only the specified obs field types

  • keep_obsp_fields (bool or list of str, optional) –

    • If None (default): Remove all obsp fields from specified run(s)

    • If False: Remove all obsp fields from specified run(s)

    • If True: Keep all obsp fields from specified run(s)

    • If list: Keep only the specified obsp field types

  • keep_varm_fields (bool or list of str, optional) –

    • If None (default): Remove all varm fields from specified run(s)

    • If False: Remove all varm fields from specified run(s)

    • If True: Keep all varm fields from specified run(s)

    • If list: Keep only the specified varm field types

  • inplace (bool, default True) – If True, modify adata in place. If False, return a copy.

Returns:

If inplace=False, returns modified copy. If inplace=True, returns None.

Return type:

AnnData or None

Notes

Layer field types:

  • 'smoothed': Smoothed expression for each condition

  • 'fold_change': Log fold change for each cell and gene

  • 'fold_change_zscores': Z-scores of log fold changes

  • 'std_with_sample_var': Posterior standard deviations with sample variance

Var field types:

  • 'mean_log_fold_change': Mean log fold change values

  • 'mahalanobis': Mahalanobis distances

  • 'ptp': Posterior tail probability

  • 'mahalanobis_pvalue': P-values from empirical null

  • 'mahalanobis_local_fdr': Local FDR values

  • 'mahalanobis_tail_fdr': Tail-based FDR values

  • 'is_de': Boolean indicator of differential expression

  • 'weighted_mean_log_fold_change': Weighted mean log fold change

Obs field types:

  • 'std': Posterior standard deviations

Obsp field types:

  • 'covariance': Posterior covariance matrices for fold changes

Varm field types:

  • 'mean_log_fold_change': Mean log fold change per group

  • 'mahalanobis': Mahalanobis distances per group

  • 'weighted_mean_log_fold_change': Weighted mean log fold change per group

Examples

>>> cleanup(adata)  # Remove all layers from all runs
>>> cleanup(adata, run_ids=0)  # Remove layers from specific run
>>> cleanup(adata, run_ids=[0, 2, 5])  # Multiple runs
>>> cleanup(adata, keep_layers=['fold_change'])  # Keep only fold change
>>> # Remove all layers and obsp covariance matrices
>>> cleanup(adata, keep_layers=False, keep_obsp_fields=False)
>>> # Keep only essential statistical fields from run 0
>>> cleanup(
...     adata,
...     run_ids=0,
...     keep_layers=False,
...     keep_var_fields=['mahalanobis', 'mahalanobis_local_fdr', 'is_de', 'mean_log_fold_change'],
...     keep_obs_fields=False,
... )

Notes

  • By default, cleans up ALL runs to maximize space savings

  • By default, keeps all statistical results (var/obs fields) but removes layers

  • Large data typically in: layers (smoothed, fold_change), obsp (covariance)

  • This does NOT modify the run history - deleted fields are marked as missing

  • Use RunInfo to check which fields are present vs deleted

kompot.get_field_status(adata: AnnData, run_id: int | None = None, analysis_type: str = 'de') Dict[str, Dict[str, Dict[str, bool]]]View on GitHub

Get the status of all fields from a differential analysis run.

Shows which fields are present vs missing/deleted.

Parameters:
  • adata (AnnData) – AnnData object with differential analysis results

  • run_id (int, optional) – Run ID to check. If None, uses most recent run.

  • analysis_type (str, default 'de') – Type of analysis: ‘de’, ‘da’, or ‘smooth’

Returns:

Nested dictionary with structure: {location: {field_type: {field_name: is_present}}}

Return type:

dict

Examples

>>> status = get_field_status(adata)
>>> print(status['layers']['smoothed'])
{'result_A_smoothed': True, 'result_B_smoothed': False}

Representation Analysis

kompot.check_underrepresentation(adata: AnnData, groupby: str, groups: str | dict | list | ndarray, conditions: List[str] | None = None, min_cells: int = 30, min_percentage: float | None = None, warn: bool = True, print_summary: bool = False) Dict[str, Any]View on GitHub

Check if any condition is underrepresented in any group.

Parameters:
  • adata (AnnData) – AnnData object containing cells/observations

  • groupby (str) – Column in adata.obs defining conditions to check

  • groups (str, dict, list, np.ndarray) – Groups to check for representation, either: - str: Column name in adata.obs defining groups - dict: Mapping from group names to boolean masks or indices - list, np.ndarray: Boolean mask or indices for a single group

  • conditions (List[str], optional) – List of condition values to check, by default None (uses all values in groupby column)

  • min_cells (int, optional) – Minimum number of cells required for each condition in each group, by default 30

  • min_percentage (float, optional) – Minimum percentage of cells for each condition in each group, by default None

  • warn (bool, optional) – Whether to emit warnings for underrepresentation, by default True

  • print_summary (bool, optional) – Whether to print a summary of underrepresentation results, by default False

Returns:

Dictionary with underrepresentation data, contains: - __underrepresentation_data: Dict mapping groups to underrepresented conditions - group_key: List of group names (if groups was a string column name) - Other metadata depending on groups type

Return type:

Dict[str, Any]