anglemania
computes critical angles between genes across all samples
provided in an SingleCellExperiment
. It calculates angles,
transforms them to z-scores, computes statistical measures, and selects
genes based on specified thresholds. These genes are biologically
informative and invariant to batch effects.
This function adds a unique batch identifier to the metadata of a
SingleCellExperiment
object by combining specified dataset and batch
keys. This is useful for distinguishing samples during integration or
analysis.
anglemania(
sce,
batch_key,
dataset_key = NA_character_,
min_cells_per_gene = 1,
min_samples_per_gene = 2,
allow_missing_features = FALSE,
zscore_mean_threshold = 2.5,
zscore_sn_threshold = 2.5,
max_n_genes = NULL,
method = "cosine",
permute_row_or_column = "column",
permutation_function = "sample",
prefilter_threshold = 0.5,
normalization_method = "divide_by_total_counts",
verbose = TRUE
)
check_params(
sce,
batch_key,
dataset_key,
zscore_mean_threshold,
zscore_sn_threshold,
max_n_genes,
method,
min_cells_per_gene,
min_samples_per_gene,
allow_missing_features,
permute_row_or_column,
permutation_function,
prefilter_threshold,
normalization_method,
verbose
)
add_unique_batch_key(sce, dataset_key = NA_character_, batch_key)
get_intersect_genes(
matrix_list,
allow_missing_features = FALSE,
min_samples_per_gene = 1,
verbose = TRUE
)
A SingleCellExperiment
object.
A character string specifying the column name in the metadata that identifies the batch.
A character string specifying the column name in the
metadata that identifies the dataset. If NA
, only the
batch_key
is used.
Integer specifying the minimum number of cells per
gene. Default is 1
.
Integer indicating the minimum number of samples per gene.
Logical indicating whether to allow missing features.
Numeric value specifying the threshold
for the mean z-score. Default is 2.5
.
Numeric value specifying the threshold for the
signal-to-noise z-score. Default is 2.5
.
Integer specifying the maximum number of genes to select.
Character string specifying the method to use for calculating
the relationship between gene pairs. Default is "cosine"
.
Other options include "spearman"
Character "row" or "column", whether
permutations
should be executed row-wise or column wise. Default is "column"
Character "sample" or "permute_nonzero". If
sample,then sample is used for constructing background distributions. If
permute_nonzero, then only non-zero values are permuted. Default is
"sample"
Numeric value specifying the threshold prefiltering genes. Speeds up gene selection.
Character "divide_by_total_counts" or
"scale_by_total_counts". Default is "divide_by_total_counts"
Logical indicating whether to print messages.
A list of bigstatsr::FBM
objects.
An updated SingleCellExperiment
object with computed
statistics and selected genes based on the specified thresholds.
The results are stored in the metadata of the SingleCellExperiment
object.
A list of validated parameters
A SingleCellExperiment
object with an additional metadata
column containing the unique batch key.
A character vector of intersected genes.
This function performs the following steps:
Computes angles between genes for each batch in the
SingleCellExperiment
using the specified method
, via
factorise
.
Transforms the angles to z-scores.
Computes statistical measures (mean z-score, signal-to-noise ratio)
across batches using get_list_stats
.
Selects genes based on specified z-score thresholds using
select_genes
.
The computed statistics and selected genes are added to the
SingleCellExperiment
object, which is returned.
check_params()
: Check Parameters provided to the anglemania function
add_unique_batch_key()
: Temporarily add a unique batch key to the dataset
get_intersect_genes()
: Extract the intersected genes from a
list of matrices (count matrices from different batches/datasets).
It also allows for missing features in individual matrices, so
that a feature does not have to be present in every single batch.
# Set seed (optional)
set.seed(1)
sce <- sce_example()
sce <- anglemania(
sce,
batch_key = "batch",
method = "cosine",
zscore_mean_threshold = 2,
zscore_sn_threshold = 2
)
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#>
| | 0 % elapsed=00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e1ad721b4" which didn't exist..
#>
|========================= | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e7e9b1836" which didn't exist..
#>
|==================================================| 100% elapsed=01s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 194 genes for integration.
# Access the selected genes
selected_genes <- get_anglemania_genes(sce)
selected_genes[1:10]
#> [1] "gene43" "gene207" "gene119" "gene130" "gene25" "gene209" "gene82"
#> [8] "gene114" "gene53" "gene113"
sce <- sce_example()
params <- check_params(
sce,
batch_key = "batch",
dataset_key = "dataset",
zscore_mean_threshold = 2.5,
zscore_sn_threshold = 2.5,
max_n_genes = 2000,
method = "cosine",
min_cells_per_gene = 1,
min_samples_per_gene = 2,
allow_missing_features = FALSE,
permute_row_or_column = "column",
permutation_function = "sample",
prefilter_threshold = 0.5,
normalization_method = "divide_by_total_counts",
verbose = TRUE
)
#> Using dataset_key: dataset
sce <- sce_example()
head(SummarizedExperiment::colData(sce))
#> DataFrame with 6 rows and 2 columns
#> batch dataset
#> <character> <character>
#> cell1 batch1 dataset1
#> cell2 batch1 dataset1
#> cell3 batch1 dataset1
#> cell4 batch1 dataset1
#> cell5 batch1 dataset1
#> cell6 batch1 dataset1
sce <- add_unique_batch_key(
sce,
batch_key = "batch",
dataset_key = "dataset"
)
head(SummarizedExperiment::colData(sce))
#> DataFrame with 6 rows and 3 columns
#> batch dataset anglemania_batch
#> <character> <character> <character>
#> cell1 batch1 dataset1 dataset1:batch1
#> cell2 batch1 dataset1 dataset1:batch1
#> cell3 batch1 dataset1 dataset1:batch1
#> cell4 batch1 dataset1 dataset1:batch1
#> cell5 batch1 dataset1 dataset1:batch1
#> cell6 batch1 dataset1 dataset1:batch1
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
sce <- sce_example()
barcodes_by_batch <- split(colnames(sce), colData(sce)$batch)
matrix_list <- lapply(barcodes_by_batch, function(barcodes) {
SingleCellExperiment::counts(sce)[, barcodes]
})
intersect_genes <- get_intersect_genes(matrix_list)
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
head(intersect_genes)
#> [1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6"