Introduction

In this Vignette we want to quickly show the few other functions that are available in anglemania besides the two main ones: create_anglemania_object and anglemania.

library(anglemania)
library(SingleCellExperiment)
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
##     table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

Normal anglemania workflow

sce <- sce_example()
batch_key <- "batch"
angl <- create_anglemania_object(sce, batch_key = batch_key)
## No dataset_key specified.
## Assuming that all samples belong to the same dataset and are separated by batch_key: batch
## Extracting count matrices...
## Filtering each batch to at least 1 cells per gene...
## Using the intersection of filtered genes from all batches...
## Number of genes in intersected set: 300
angl <- anglemania(angl)
## Computing angles and transforming to z-scores...
## Computing statistics...
## Weighting matrix_list...
## Calculating mean...
## Calculating sds...
## Filtering features...
## [1] "Selected 36 genes for integration."
angl
## anglemania_object
## --------------
## Dataset key: NA 
## Batch key: batch 
## Number of datasets: 1 
## Total number of batches: 2 
## Batches (showing first 5):
## batch1, batch2 
## Number of intersected genes: 300 
## Intersected genes (showing first 10):
## gene1, gene2, gene3, gene4, gene5, gene6, gene7, gene8, gene9, gene10 , ...
## Min cells per gene: 1

under the hood of the anglemania function

  • anglemania is run on the anglemania object and it calls three functions:
    • factorise:
      • creates a permutation of the input matrix to create a null distribution
      • computes the cosine similarity (or spearman coefficient or diem) between gene expresion vector pairs matrix for both the original and permuted matrices
      • computes the zscore of the relationship between the gene pairs taking the mean and standard deviation of the null distribution
      • it does this for every batch in the dataset!
    • get_list_stats
      • computes the mean and standard deviation of the zscores across the matrices from the different batches.
    • select_genes
      • filters the gene pairs by the mean zscore and the signal-to-noise ratio (SN, i.e. the mean divided by the standard deviation).

factorise

barcodes_by_batch <- split(rownames(colData(sce)), colData(sce)[[batch_key]])
counts_by_batch <- lapply(barcodes_by_batch, function(x) {
    counts(sce[, x]) %>% sparse_to_fbm()
})
counts_by_batch[[1]][1:10, 1:6]
##       [,1] [,2] [,3] [,4] [,5] [,6]
##  [1,]    8    5    5    2    4    2
##  [2,]    9    5    4    1    4    9
##  [3,]    4    2    7    4    6    1
##  [4,]    7    4    4    1    3    8
##  [5,]    6    8    5    5    7    4
##  [6,]    5    8    5    9    8    6
##  [7,]    6    4    7    4    7    5
##  [8,]    3    3    3    3    5    4
##  [9,]    6    4    5    5    3    1
## [10,]    6    4    5    2    6    4
class(counts_by_batch[[1]])
## [1] "FBM"
## attr(,"package")
## [1] "bigstatsr"
factorised <- lapply(counts_by_batch, factorise)
factorised[[1]][1:10, 1:6]
##             [,1]        [,2]       [,3]        [,4]       [,5]       [,6]
##  [1,]         NA -1.77559431 -0.8253279  1.14791468  0.8304337  0.4723581
##  [2,] -1.7755943          NA  0.5726423  2.49881313  0.5105572  1.4157996
##  [3,] -0.8253279  0.57264234         NA  0.32316488 -1.5416407 -0.0291761
##  [4,]  1.1479147  2.49881313  0.3231649          NA -0.3972312 -0.6038679
##  [5,]  0.8304337  0.51055725 -1.5416407 -0.39723123         NA -0.2773380
##  [6,]  0.4723581  1.41579962 -0.0291761 -0.60386791 -0.2773380         NA
##  [7,] -0.3793546 -0.08986951 -1.0834393  0.01453768  0.3333673  1.3412298
##  [8,]  0.0815372 -1.42754363 -1.3398126 -0.27740608 -0.8317583 -1.6147436
##  [9,]  0.5330219  0.05725224 -0.3868227 -0.30696466 -0.9537188 -0.1300575
## [10,]  0.1071082 -0.33100426  1.7263481  0.43416665  0.5866818  1.1300376

get_list_stats

  • get_list_stats gets called on the anglemania object so we create and inspect a anglemania object
sce <- sce_example()
batch_key <- "batch"
angl <- create_anglemania_object(sce, batch_key = batch_key)
## No dataset_key specified.
## Assuming that all samples belong to the same dataset and are separated by batch_key: batch
## Extracting count matrices...
## Filtering each batch to at least 1 cells per gene...
## Using the intersection of filtered genes from all batches...
## Number of genes in intersected set: 300
list_stats(angl) # this slot is empty
## list()
angl <- anglemania(angl) 
## Computing angles and transforming to z-scores...
## Computing statistics...
## Weighting matrix_list...
## Calculating mean...
## Calculating sds...
## Filtering features...
## [1] "Selected 36 genes for integration."
## [1] "mean_zscore" "sds_zscore"  "sn_zscore"
## [1] "list"
list_stats(angl)$mean_zscore[1:10, 1:6]
##             [,1]        [,2]       [,3]        [,4]        [,5]        [,6]
##  [1,]         NA -1.47361566 -0.6737199  0.79226342  0.20563482  0.29347030
##  [2,] -1.4736157          NA  0.7911388  1.78368051  0.85782278  0.05812042
##  [3,] -0.6737199  0.79113876         NA -0.81497507 -0.16581743  0.40981298
##  [4,]  0.7922634  1.78368051 -0.8149751          NA -0.32367957 -0.32209876
##  [5,]  0.2056348  0.85782278 -0.1658174 -0.32367957          NA -0.82414755
##  [6,]  0.2934703  0.05812042  0.4098130 -0.32209876 -0.82414755          NA
##  [7,]  0.5353808  0.39957517 -1.3597171 -0.05621663  0.31106411  0.48942810
##  [8,] -0.0478384 -1.11509607 -0.7896121 -0.63001026 -1.43434185 -0.84468556
##  [9,]  1.2866480  0.34648068 -0.5253048 -0.78855628 -0.69594257 -0.50281463
## [10,]  0.5280248  1.21974236  0.8667575 -0.27678173  0.04317401  0.87613079
list_stats(angl)$sn_zscore[1:10, 1:6]
##            [,1]       [,2]      [,3]      [,4]        [,5]       [,6]
##  [1,]        NA 4.87986702 4.4438254 2.2276412  0.32912160 1.64052702
##  [2,] 4.8798670         NA 3.6208316 2.4941954  2.47022152 0.04280865
##  [3,] 4.4438254 3.62083164        NA 0.7160588  0.12052233 0.93353798
##  [4,] 2.2276412 2.49419544 0.7160588        NA  4.40071059 1.14312995
##  [5,] 0.3291216 2.47022152 0.1205223 4.4007106          NA 1.50719307
##  [6,] 1.6405270 0.04280865 0.9335380 1.1431300  1.50719307         NA
##  [7,] 0.5852849 0.81638473 4.9215576 0.7945330 13.94709308 0.57457988
##  [8,] 0.3697637 3.56890627 1.4351352 1.7867351  2.38032046 1.09691151
##  [9,] 1.7072764 1.19794817 3.7933035 1.6373962  2.69979335 1.34890662
## [10,] 1.2544641 0.78655168 1.0083377 0.3893134  0.07943585 3.45060039

select_genes

  • under the hood, anglemania calls select_genes on the anglemania object with the default thresholds zscore_mean_threshold = 2.5, zscore_sn_threshold = 2.5
  • we can use select_genes to change the thresholds without having to run anglemania again
previous_genes <- get_anglemania_genes(angl)
angl <- select_genes(angl,
                     zscore_mean_threshold = 2,
                     zscore_sn_threshold = 2)
## [1] "Selected 214 genes for integration."
# Inspect the anglemania genes
new_genes <- get_anglemania_genes(angl)

length(previous_genes)
## [1] 36
length(new_genes)
## [1] 214