anglemania computes critical angles between genes across all samples provided in an SingleCellExperiment. It calculates angles, transforms them to z-scores, computes statistical measures, and selects genes based on specified thresholds. These genes are biologically informative and invariant to batch effects.

This function adds a unique batch identifier to the metadata of a SingleCellExperiment object by combining specified dataset and batch keys. This is useful for distinguishing samples during integration or analysis.

anglemania(
  sce,
  batch_key,
  dataset_key = NA_character_,
  min_cells_per_gene = 1,
  min_samples_per_gene = 2,
  allow_missing_features = FALSE,
  zscore_mean_threshold = 2.5,
  zscore_sn_threshold = 2.5,
  max_n_genes = NULL,
  method = "cosine",
  permute_row_or_column = "column",
  permutation_function = "sample",
  prefilter_threshold = 0.5,
  normalization_method = "divide_by_total_counts",
  verbose = TRUE
)

check_params(
  sce,
  batch_key,
  dataset_key,
  zscore_mean_threshold,
  zscore_sn_threshold,
  max_n_genes,
  method,
  min_cells_per_gene,
  min_samples_per_gene,
  allow_missing_features,
  permute_row_or_column,
  permutation_function,
  prefilter_threshold,
  normalization_method,
  verbose
)

add_unique_batch_key(sce, dataset_key = NA_character_, batch_key)

get_intersect_genes(
  matrix_list,
  allow_missing_features = FALSE,
  min_samples_per_gene = 1,
  verbose = TRUE
)

Arguments

sce

A SingleCellExperiment object.

batch_key

A character string specifying the column name in the metadata that identifies the batch.

dataset_key

A character string specifying the column name in the metadata that identifies the dataset. If NA, only the batch_key is used.

min_cells_per_gene

Integer specifying the minimum number of cells per gene. Default is 1.

min_samples_per_gene

Integer indicating the minimum number of samples per gene.

allow_missing_features

Logical indicating whether to allow missing features.

zscore_mean_threshold

Numeric value specifying the threshold for the mean z-score. Default is 2.5.

zscore_sn_threshold

Numeric value specifying the threshold for the signal-to-noise z-score. Default is 2.5.

max_n_genes

Integer specifying the maximum number of genes to select.

method

Character string specifying the method to use for calculating the relationship between gene pairs. Default is "cosine". Other options include "spearman"

permute_row_or_column

Character "row" or "column", whether permutations should be executed row-wise or column wise. Default is "column"

permutation_function

Character "sample" or "permute_nonzero". If sample,then sample is used for constructing background distributions. If permute_nonzero, then only non-zero values are permuted. Default is "sample"

prefilter_threshold

Numeric value specifying the threshold prefiltering genes. Speeds up gene selection.

normalization_method

Character "divide_by_total_counts" or "scale_by_total_counts". Default is "divide_by_total_counts"

verbose

Logical indicating whether to print messages.

matrix_list

A list of bigstatsr::FBM objects.

Value

An updated SingleCellExperiment object with computed statistics and selected genes based on the specified thresholds. The results are stored in the metadata of the SingleCellExperiment object.

A list of validated parameters

A SingleCellExperiment object with an additional metadata column containing the unique batch key.

A character vector of intersected genes.

Details

This function performs the following steps:

  1. Computes angles between genes for each batch in the SingleCellExperiment using the specified method, via factorise.

  2. Transforms the angles to z-scores.

  3. Computes statistical measures (mean z-score, signal-to-noise ratio) across batches using get_list_stats.

  4. Selects genes based on specified z-score thresholds using select_genes.

The computed statistics and selected genes are added to the SingleCellExperiment object, which is returned.

Functions

  • check_params(): Check Parameters provided to the anglemania function

  • add_unique_batch_key(): Temporarily add a unique batch key to the dataset

  • get_intersect_genes(): Extract the intersected genes from a list of matrices (count matrices from different batches/datasets). It also allows for missing features in individual matrices, so that a feature does not have to be present in every single batch.

Examples

# Set seed (optional)
set.seed(1)
sce <- sce_example()
sce <- anglemania(
  sce,
  batch_key = "batch",
  method = "cosine",
  zscore_mean_threshold = 2,
  zscore_sn_threshold = 2
)
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#> 
  |                                                  | 0 % elapsed=00s   
#> Creating directory "/tmp/RtmpwM6D04/file1f8e1ad721b4" which didn't exist..
#> 
  |=========================                         | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e7e9b1836" which didn't exist..
#> 
  |==================================================| 100% elapsed=01s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 194 genes for integration.

# Access the selected genes
selected_genes <- get_anglemania_genes(sce)
selected_genes[1:10]
#>  [1] "gene43"  "gene207" "gene119" "gene130" "gene25"  "gene209" "gene82" 
#>  [8] "gene114" "gene53"  "gene113"
sce <- sce_example()
params <- check_params(
  sce,
  batch_key = "batch",
  dataset_key = "dataset",
  zscore_mean_threshold = 2.5,
  zscore_sn_threshold = 2.5,
  max_n_genes = 2000,
  method = "cosine",
  min_cells_per_gene = 1,
  min_samples_per_gene = 2,
  allow_missing_features = FALSE,
  permute_row_or_column = "column",
  permutation_function = "sample",
  prefilter_threshold = 0.5,
  normalization_method = "divide_by_total_counts",
  verbose = TRUE
)
#> Using dataset_key: dataset
sce <- sce_example()
head(SummarizedExperiment::colData(sce))
#> DataFrame with 6 rows and 2 columns
#>             batch     dataset
#>       <character> <character>
#> cell1      batch1    dataset1
#> cell2      batch1    dataset1
#> cell3      batch1    dataset1
#> cell4      batch1    dataset1
#> cell5      batch1    dataset1
#> cell6      batch1    dataset1
sce <- add_unique_batch_key(
    sce,
    batch_key = "batch",
    dataset_key = "dataset"
)
head(SummarizedExperiment::colData(sce))
#> DataFrame with 6 rows and 3 columns
#>             batch     dataset anglemania_batch
#>       <character> <character>      <character>
#> cell1      batch1    dataset1  dataset1:batch1
#> cell2      batch1    dataset1  dataset1:batch1
#> cell3      batch1    dataset1  dataset1:batch1
#> cell4      batch1    dataset1  dataset1:batch1
#> cell5      batch1    dataset1  dataset1:batch1
#> cell6      batch1    dataset1  dataset1:batch1
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#> 
#>     findMatches
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#> 
#>     rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     anyMissing, rowMedians
sce <- sce_example()
barcodes_by_batch <- split(colnames(sce), colData(sce)$batch)
matrix_list <- lapply(barcodes_by_batch, function(barcodes) {
    SingleCellExperiment::counts(sce)[, barcodes]
})
intersect_genes <- get_intersect_genes(matrix_list)
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
head(intersect_genes)
#> [1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6"