Skip to contents

Identifies differentially expressed SNPs between cell populations by comparing read depths and alternative allele frequencies. This function performs comprehensive statistical analysis at the single-cell level, with support for parallel processing to improve performance on large datasets.

Arguments

ident.1

Character. Primary cell identity to analyze.

ident.2

Character, optional. Secondary cell identity to compare against. If NULL, compares against all other cells.

donor_type

Character, optional. Donor type to restrict analysis to ("Donor" or "Recipient"). If NULL, uses all cells regardless of donor type.

use_normalized

Logical. Whether to use normalized depth counts (TRUE) or raw counts (FALSE).

min_expr_cells

Integer. Minimum number of expressing cells required in each group.

min_alt_frac

Numeric between 0 and 1. Minimum alternative allele fraction to consider a cell as expressing.

logfc.threshold

Numeric. Minimum absolute log2 fold-change required to report a SNP.

calc_p

Logical. Whether to calculate p-values (Wilcoxon test). Set to FALSE to save computation time.

p.adjust.method

Character. Method for p-value adjustment, passed to p.adjust(). Default: "BH" (Benjamini-Hochberg).

return_all

Logical. Whether to return all SNPs or only significant ones.

pseudocount

Numeric. Value added to expression values before log transformation.

min.p

Numeric. Minimum p-value to report (prevents numerical underflow).

debug

Logical. Whether to print debugging information during analysis.

n_cores

Integer, optional. Number of CPU cores to use for parallel processing. If NULL, automatically uses detectCores() - 1.

use_parallel

Logical. Whether to implement parallel processing.

chunk_size

Integer. Number of SNPs to process in each batch during parallel execution. Larger values may improve performance but require more memory.

max_ram_gb

Numeric. Maximum RAM usage estimate in gigabytes for parallel processing. The function will automatically reduce chunk_size if estimated memory usage would exceed this limit.

Value

List containing:

results

Data frame of differentially expressed SNPs with metrics including log2FC, expression values, cell counts, and significance statistics.

summary

List with analysis overview, including counts of significant SNPs, up/downregulated SNPs, and parameter settings used.

Details

The function calculates differential expression by comparing the average expression of SNPs between two groups, normalized by the total number of cells in each group. For each SNP, cells are only considered as expressing if they have a minimum alternative allele fraction (min_alt_frac) and positive read depth.

Statistical testing is performed using Wilcoxon rank-sum test when calc_p=TRUE. Multiple testing correction is applied using the specified p.adjust.method.

The parallel implementation distributes SNP processing across multiple CPU cores for significantly improved performance on large datasets.

Note

  • Requires package 'parallel', 'foreach', and 'doParallel' for parallel processing

  • Project identity must be set before using this function via setProjectIdentity()

  • For non-transplant datasets, donor_type filtering is automatically disabled

See also

setProjectIdentity for setting the cell identity to use findSNPsByGroup for group-level SNP analysis

Examples


if (FALSE) { # \dontrun{
# Initialize a variantCell project

proj$setProjectIdentity('cell_type')

# Basic usage comparing T cells vs other cells, donor cells only
results <- proj$findDESNPs(
  ident.1 = "T_cells",
  ident.2 = NULL,
  donor_type = "Donor",
  min_expr_cells = 5,
  logfc.threshold = 0.25
)

# Without p-value calculation for faster processing
fast_results <- proj$findDESNPs(
  ident.1 = "CD4",
  ident.2 = "CD8",
  calc_p = FALSE,
  n_cores = 8
)

# Access results
head(results$results)
results$summary
} # }