Skip to contents

Identifies SNPs that are exclusively or predominantly present in one cell group compared to another. This function analyzes alternative allele frequencies between groups using aggregated data to detect group-specific genetic variants.

Arguments

ident.1

Character. Primary group identity to analyze.

ident.2

Character, optional. Secondary group identity to compare against. If NULL, compares against all other groups combined.

aggregated_data

List. Output from aggregateByGroup function with required matrices and metadata.

min_depth

Integer. Minimum total read depth required for a group to consider a SNP.

min_alt_frac

Numeric between 0 and 1. Minimum alternative allele fraction required in a group for a SNP to be considered present.

max_alt_frac_other

Numeric between 0 and 1. Maximum alternative allele fraction allowed in the other group for a SNP to be considered absent there.

return_all

Logical. Whether to return all results regardless of significance.

Value

List containing:

results

Data frame of group-specific SNPs with metrics including genomic position, gene annotation, depth metrics, allele frequencies, and presence classification.

summary

List with analysis overview including counts of SNPs present in each group and parameters used for filtering.

Details

The function identifies SNPs that are present in one group but absent in another by applying thresholds to alternative allele frequencies. For each SNP, a presence score is calculated that quantifies the strength of evidence for group-specific presence, considering both the frequency difference and the read depth.

A SNP is considered "present" in a group when its alternative allele frequency exceeds min_alt_frac and the read depth exceeds min_depth. It is considered "absent" in the other group when its alternative allele frequency is below max_alt_frac_other and the read depth exceeds min_depth.

The presence score formula is: score = (alt_frac_present - alt_frac_absent) * (depth/min_depth) * (1 - alt_frac_absent/min_alt_frac)

Note

  • This function operates on pre-aggregated data from aggregateByGroup() rather than raw SNP data

  • Non-transplant mode is automatically detected from the aggregated data parameters

  • Results are sorted by presence score, with highest-scoring SNPs listed first

See also

aggregateByGroup for preparing input data findDESNPs for cell-level differential analysis plotSNPs for visualizing the identified SNPs

Examples


if (FALSE) { # \dontrun{
# Aggregate SNP data by cell type
agg_data <- proj$aggregateByGroup(
  group_by = "cell_type",
  donor_type = "Donor",
  use_normalized = TRUE
)

# Find T cell-specific SNPs
tc_snps <- proj$findSNPsByGroup(
  ident.1 = "T_cells",
  ident.2 = "B_cells",
  aggregated_data = agg_data,
  min_depth = 20,
  min_alt_frac = 0.25,
  max_alt_frac_other = 0.05
)

# Comparing patient groups
patient_snps <- proj$findSNPsByGroup(
  ident.1 = "ACR",
  ident.2 = "No_ACR",
  aggregated_data = patient_data,
  min_alt_frac = 0.1,
  max_alt_frac_other = 0.02
)
} # }