anglemania

anglemania computes critical angles between genes across all samples provided in an SingleCellExperiment. It calculates angles, transforms them to z-scores, computes statistical measures, and selects genes based on specified thresholds. These genes are biologically informative and invariant to batch effects.

This function adds a unique batch identifier to the metadata of a SingleCellExperiment object by combining specified dataset and batch keys. This is useful for distinguishing samples during integration or analysis.

anglemania(
  sce,
  batch_key,
  dataset_key = NA_character_,
  min_cells_per_gene = 1,
  min_samples_per_gene = 2,
  allow_missing_features = FALSE,
  zscore_mean_threshold = 2.5,
  zscore_sn_threshold = 2.5,
  max_n_genes = NULL,
  method = "cosine",
  permute_row_or_column = "column",
  permutation_function = "sample",
  prefilter_threshold = 0.5,
  normalization_method = "divide_by_total_counts",
  verbose = TRUE
)

check_params(
  sce,
  batch_key,
  dataset_key,
  zscore_mean_threshold,
  zscore_sn_threshold,
  max_n_genes,
  method,
  min_cells_per_gene,
  min_samples_per_gene,
  allow_missing_features,
  permute_row_or_column,
  permutation_function,
  prefilter_threshold,
  normalization_method,
  verbose
)

add_unique_batch_key(sce, dataset_key = NA_character_, batch_key)

get_intersect_genes(
  matrix_list,
  allow_missing_features = FALSE,
  min_samples_per_gene = 1,
  verbose = TRUE
)

Arguments

sce: A SingleCellExperiment object.
batch_key: A character string specifying the column name in the metadata that identifies the batch.
dataset_key: A character string specifying the column name in the metadata that identifies the dataset. If NA, only the batch_key is used.
min_cells_per_gene: Integer specifying the minimum number of cells per gene. Default is 1.
min_samples_per_gene: Integer indicating the minimum number of samples per gene.
allow_missing_features: Logical indicating whether to allow missing features.
zscore_mean_threshold: Numeric value specifying the threshold for the mean z-score. Default is 2.5.
zscore_sn_threshold: Numeric value specifying the threshold for the signal-to-noise z-score. Default is 2.5.
max_n_genes: Integer specifying the maximum number of genes to select.
method: Character string specifying the method to use for calculating the relationship between gene pairs. Default is "cosine". Other options include "spearman"
permute_row_or_column: Character "row" or "column", whether permutations should be executed row-wise or column wise. Default is "column"
permutation_function: Character "sample" or "permute_nonzero". If sample,then sample is used for constructing background distributions. If permute_nonzero, then only non-zero values are permuted. Default is "sample"
prefilter_threshold: Numeric value specifying the threshold prefiltering genes. Speeds up gene selection.
normalization_method: Character "divide_by_total_counts" or "scale_by_total_counts". Default is "divide_by_total_counts"
verbose: Logical indicating whether to print messages.
matrix_list: A list of bigstatsr::FBM objects.

Value

An updated SingleCellExperiment object with computed statistics and selected genes based on the specified thresholds. The results are stored in the metadata of the SingleCellExperiment object.

A list of validated parameters

A SingleCellExperiment object with an additional metadata column containing the unique batch key.

A character vector of intersected genes.

Details

This function performs the following steps:

Computes angles between genes for each batch in the SingleCellExperiment using the specified method, via factorise.
Transforms the angles to z-scores.
Computes statistical measures (mean z-score, signal-to-noise ratio) across batches using get_list_stats.
Selects genes based on specified z-score thresholds using select_genes.

The computed statistics and selected genes are added to the SingleCellExperiment object, which is returned.

Functions

check_params(): Check Parameters provided to the anglemania function
add_unique_batch_key(): Temporarily add a unique batch key to the dataset
get_intersect_genes(): Extract the intersected genes from a list of matrices (count matrices from different batches/datasets). It also allows for missing features in individual matrices, so that a feature does not have to be present in every single batch.

Examples

# Set seed (optional)
set.seed(1)
sce <- sce_example()
sce <- anglemania(
  sce,
  batch_key = "batch",
  method = "cosine",
  zscore_mean_threshold = 2,
  zscore_sn_threshold = 2
)
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#> 
  |                                                  | 0 % elapsed=00s   
#> Creating directory "/tmp/RtmpwM6D04/file1f8e1ad721b4" which didn't exist..
#> 
  |=========================                         | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e7e9b1836" which didn't exist..
#> 
  |==================================================| 100% elapsed=01s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 194 genes for integration.

# Access the selected genes
selected_genes <- get_anglemania_genes(sce)
selected_genes[1:10]
#>  [1] "gene43"  "gene207" "gene119" "gene130" "gene25"  "gene209" "gene82" 
#>  [8] "gene114" "gene53"  "gene113"
sce <- sce_example()
params <- check_params(
  sce,
  batch_key = "batch",
  dataset_key = "dataset",
  zscore_mean_threshold = 2.5,
  zscore_sn_threshold = 2.5,
  max_n_genes = 2000,
  method = "cosine",
  min_cells_per_gene = 1,
  min_samples_per_gene = 2,
  allow_missing_features = FALSE,
  permute_row_or_column = "column",
  permutation_function = "sample",
  prefilter_threshold = 0.5,
  normalization_method = "divide_by_total_counts",
  verbose = TRUE
)
#> Using dataset_key: dataset
sce <- sce_example()
head(SummarizedExperiment::colData(sce))
#> DataFrame with 6 rows and 2 columns
#>             batch     dataset
#>       <character> <character>
#> cell1      batch1    dataset1
#> cell2      batch1    dataset1
#> cell3      batch1    dataset1
#> cell4      batch1    dataset1
#> cell5      batch1    dataset1
#> cell6      batch1    dataset1
sce <- add_unique_batch_key(
    sce,
    batch_key = "batch",
    dataset_key = "dataset"
)
head(SummarizedExperiment::colData(sce))
#> DataFrame with 6 rows and 3 columns
#>             batch     dataset anglemania_batch
#>       <character> <character>      <character>
#> cell1      batch1    dataset1  dataset1:batch1
#> cell2      batch1    dataset1  dataset1:batch1
#> cell3      batch1    dataset1  dataset1:batch1
#> cell4      batch1    dataset1  dataset1:batch1
#> cell5      batch1    dataset1  dataset1:batch1
#> cell6      batch1    dataset1  dataset1:batch1
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#> 
#>     findMatches
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#> 
#>     rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     anyMissing, rowMedians
sce <- sce_example()
barcodes_by_batch <- split(colnames(sce), colData(sce)$batch)
matrix_list <- lapply(barcodes_by_batch, function(barcodes) {
    SingleCellExperiment::counts(sce)[, barcodes]
})
intersect_genes <- get_intersect_genes(matrix_list)
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
head(intersect_genes)
#> [1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6"

Arguments

Value

Details

Functions

See also

Examples