anglemania
computes critical angles between genes across all samples
provided in an
SingleCellExperiment
object. It calculates angles, transforms them to z-scores, computes
statistical measures, and selects the top genes based on mean and standard
deviation of z-scores.
These genes are biologically informative and invariant to batch effects.
This function adds a unique batch identifier to the metadata of a
SingleCellExperiment
object by combining specified dataset and batch
keys. This is useful for distinguishing samples during integration or
analysis.
anglemania(
sce,
batch_key,
dataset_key = NA_character_,
max_n_genes = 2000,
min_cells_per_gene = 1,
min_samples_per_gene = 2,
allow_missing_features = FALSE,
method = "cosine",
permute_row_or_column = "column",
permutation_function = "sample",
prefilter_threshold = 0.5,
normalization_method = "divide_by_total_counts",
verbose = TRUE
)
check_params(
sce,
batch_key,
dataset_key,
max_n_genes,
method,
min_cells_per_gene,
min_samples_per_gene,
allow_missing_features,
permute_row_or_column,
permutation_function,
prefilter_threshold,
normalization_method,
verbose
)
add_unique_batch_key(sce, dataset_key = NA_character_, batch_key)
get_intersect_genes(
matrix_list,
allow_missing_features = FALSE,
min_samples_per_gene = 1,
verbose = TRUE
)
A SingleCellExperiment
object.
A character string specifying the column name in the metadata that identifies the batch.
A character string specifying the column name in the
metadata that identifies the dataset. If NA
, only the
batch_key
is used.
Integer specifying the maximum number of genes to select.
Integer specifying the minimum number of cells per
gene. Default is 1
.
Integer indicating the minimum number of samples per gene.
Logical indicating whether to allow missing features.
Character string specifying the method to use for calculating
the relationship between gene pairs. Default is "cosine"
.
Other options include "spearman"
Character "row" or "column", whether
permutations
should be executed row-wise or column wise. Default is "column"
Character "sample" or "permute_nonzero". If
sample,then sample is used for constructing background distributions. If
permute_nonzero, then only non-zero values are permuted. Default is
"sample"
Numeric value specifying the threshold prefiltering genes. Speeds up gene selection.
Character "divide_by_total_counts" or
"scale_by_total_counts". Default is "divide_by_total_counts"
Logical indicating whether to print messages.
A list of bigstatsr::FBM
objects.
An updated SingleCellExperiment
object with computed
statistics and selected genes.
The results are stored in the metadata of the SingleCellExperiment
object.
A list of validated parameters
A SingleCellExperiment
object with an additional metadata
column containing the unique batch key.
A character vector of intersected genes.
This function performs the following steps:
Computes angles between genes for each batch in the
SingleCellExperiment
using the specified method
, via
factorise
.
Transforms the angles to z-scores.
Computes statistical measures (mean z-score, signal-to-noise ratio)
across batches using get_list_stats
.
Selects the top n genes based on mean and standard deviation of
z-scores using select_genes
.
The computed statistics and selected genes are added to the
SingleCellExperiment
object, which is returned.
check_params()
: Check Parameters provided to the anglemania function
add_unique_batch_key()
: Temporarily add a unique batch key to the dataset
get_intersect_genes()
: Extract the intersected genes from a
list of matrices (count matrices from different batches/datasets).
It also allows for missing features in individual matrices, so
that a feature does not have to be present in every single batch.
# Set seed (optional)
set.seed(1)
sce <- sce_example()
sce <- anglemania(
sce,
batch_key = "batch",
method = "cosine"
)
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#>
| | 0 % elapsed=00s
#> Creating directory "/tmp/RtmpqekGwf/file2fd648dc5e3c" which didn't exist..
#>
|========================= | 50% elapsed=01s, remaining~01s
#> Creating directory "/tmp/RtmpqekGwf/file2fd6445076e1" which didn't exist..
#>
|==================================================| 100% elapsed=01s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> 2000 is larger than the number of intersected genes.Setting max_n_genes to 300
#> Selected 300 genes for integration.
# Access the selected genes
selected_genes <- get_anglemania_genes(sce)
selected_genes[1:10]
#> [1] "gene270" "gene294" "gene66" "gene122" "gene18" "gene161" "gene26"
#> [8] "gene276" "gene15" "gene111"
sce <- sce_example()
params <- check_params(
sce,
batch_key = "batch",
dataset_key = "dataset",
max_n_genes = 2000,
method = "cosine",
min_cells_per_gene = 1,
min_samples_per_gene = 2,
allow_missing_features = FALSE,
permute_row_or_column = "column",
permutation_function = "sample",
prefilter_threshold = 0.5,
normalization_method = "divide_by_total_counts",
verbose = TRUE
)
#> Using dataset_key: dataset
sce <- sce_example()
head(SummarizedExperiment::colData(sce))
#> DataFrame with 6 rows and 2 columns
#> batch dataset
#> <character> <character>
#> cell1 batch1 dataset1
#> cell2 batch1 dataset1
#> cell3 batch1 dataset1
#> cell4 batch1 dataset1
#> cell5 batch1 dataset1
#> cell6 batch1 dataset1
sce <- add_unique_batch_key(
sce,
batch_key = "batch",
dataset_key = "dataset"
)
head(SummarizedExperiment::colData(sce))
#> DataFrame with 6 rows and 3 columns
#> batch dataset anglemania_batch
#> <character> <character> <character>
#> cell1 batch1 dataset1 dataset1:batch1
#> cell2 batch1 dataset1 dataset1:batch1
#> cell3 batch1 dataset1 dataset1:batch1
#> cell4 batch1 dataset1 dataset1:batch1
#> cell5 batch1 dataset1 dataset1:batch1
#> cell6 batch1 dataset1 dataset1:batch1
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
sce <- sce_example()
barcodes_by_batch <- split(colnames(sce), colData(sce)$batch)
matrix_list <- lapply(barcodes_by_batch, function(barcodes) {
SingleCellExperiment::counts(sce)[, barcodes]
})
intersect_genes <- get_intersect_genes(matrix_list)
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
head(intersect_genes)
#> [1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6"