Introduction

In this Vignette we want to quickly show the few other functions that are available in anglemania besides the two main ones: create_anglemania_object and anglemania.

library(anglemania)
library(SingleCellExperiment)
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: generics
## 
## Attaching package: 'generics'
## The following objects are masked from 'package:base':
## 
##     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
##     setequal, union
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
##     unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

Normal anglemania workflow

sce <- sce_example()
batch_key <- "batch"
angl <- create_anglemania_object(sce, batch_key = batch_key)
## No dataset_key specified.
## Assuming that all samples belong to the same dataset and are separated by batch_key: batch
## Using the intersection of filtered genes from all batches...
## Number of genes in intersected set: 300
## Extracting count matrices...
## Filtering each batch to at least 1 cells per gene...
angl <- anglemania(angl)
## Computing angles and transforming to z-scores...
## Creating directory "/tmp/RtmpQiTNPd/file1fd44b3d5cc" which didn't exist..
## Creating directory "/tmp/RtmpQiTNPd/file1fd411c6b88f" which didn't exist..
## Computing statistics...
## Weighting matrix_list...
## Calculating mean...
## Calculating sds...
## Filtering features...
## Selecting features...
## [1] "Selected 25 genes for integration."
angl
## anglemania_object
## --------------
## Dataset key: NA 
## Batch key: batch 
## Number of datasets: 1 
## Total number of batches: 2 
## Batches (showing first 5):
## batch1, batch2 
## Number of intersected genes: 300 
## Intersected genes (showing first 10):
## gene1, gene2, gene3, gene4, gene5, gene6, gene7, gene8, gene9, gene10 , ...
## Min cells per gene: 1

under the hood of the anglemania function

  • anglemania is run on the anglemania object and it calls three functions:
    • factorise:
      • creates a permutation of the input matrix to create a null distribution
      • computes the cosine similarity (or spearman coefficient or diem) between gene expresion vector pairs matrix for both the original and permuted matrices
      • computes the zscore of the relationship between the gene pairs taking the mean and standard deviation of the null distribution
      • it does this for every batch in the dataset!
    • get_list_stats
      • computes the mean and standard deviation of the zscores across the matrices from the different batches.
    • select_genes
      • filters the gene pairs by the mean zscore and the signal-to-noise ratio (SN, i.e. the mean divided by the standard deviation).

factorise

barcodes_by_batch <- split(rownames(colData(sce)), colData(sce)[[batch_key]])
counts_by_batch <- lapply(barcodes_by_batch, function(x) {
    counts(sce[, x]) %>% sparse_to_fbm()
})
counts_by_batch[[1]][1:10, 1:6]
##       [,1] [,2] [,3] [,4] [,5] [,6]
##  [1,]    8    5    5    2    4    2
##  [2,]    9    5    4    1    4    9
##  [3,]    4    2    7    4    6    1
##  [4,]    7    4    4    1    3    8
##  [5,]    6    8    5    5    7    4
##  [6,]    5    8    5    9    8    6
##  [7,]    6    4    7    4    7    5
##  [8,]    3    3    3    3    5    4
##  [9,]    6    4    5    5    3    1
## [10,]    6    4    5    2    6    4
class(counts_by_batch[[1]])
## [1] "FBM"
## attr(,"package")
## [1] "bigstatsr"
factorised <- lapply(counts_by_batch, factorise)
## Creating directory "/tmp/RtmpQiTNPd/file1fd4350ffa3" which didn't exist..
## Creating directory "/tmp/RtmpQiTNPd/file1fd4409b4687" which didn't exist..
factorised[[1]][1:10, 1:6]
##              [,1]        [,2]        [,3]       [,4]       [,5]        [,6]
##  [1,]  0.00000000 -1.74668209 -0.79613304  1.1776966  0.8601212  0.50193905
##  [2,] -1.73785984  0.00000000  0.59661885  2.5115045  0.5348975  1.43483620
##  [3,] -0.85963025  0.53988067  0.00000000  0.2901283 -1.5767324 -0.06260101
##  [4,]  1.13658651  2.45514150  0.33158301  0.0000000 -0.3715653 -0.57325460
##  [5,]  0.92122274  0.57796947 -1.62420449 -0.3961608  0.0000000 -0.26750572
##  [6,]  0.52925575  1.57028828 -0.02415793 -0.6582967 -0.2979901  0.00000000
##  [7,] -0.41325531 -0.10548957 -1.16180200  0.0055108  0.3444741  1.41598180
##  [8,]  0.10001406 -1.38192310 -1.29577006 -0.2524730 -0.7968541 -1.56575595
##  [9,]  0.51996703  0.05963883 -0.37002326 -0.2927571 -0.9185203 -0.12159157
## [10,]  0.06970488 -0.33637803  1.57056534  0.3728527  0.5142178  1.01784989

get_list_stats

  • get_list_stats gets called on the anglemania object so we create and inspect a anglemania object
sce <- sce_example()
batch_key <- "batch"
angl <- create_anglemania_object(sce, batch_key = batch_key)
## No dataset_key specified.
## Assuming that all samples belong to the same dataset and are separated by batch_key: batch
## Using the intersection of filtered genes from all batches...
## Number of genes in intersected set: 300
## Extracting count matrices...
## Filtering each batch to at least 1 cells per gene...
list_stats(angl) # this slot is empty
## list()
angl <- anglemania(angl) 
## Computing angles and transforming to z-scores...
## Creating directory "/tmp/RtmpQiTNPd/file1fd43cc53ad7" which didn't exist..
## Creating directory "/tmp/RtmpQiTNPd/file1fd4a490ee9" which didn't exist..
## Computing statistics...
## Weighting matrix_list...
## Calculating mean...
## Calculating sds...
## Filtering features...
## Selecting features...
## [1] "Selected 25 genes for integration."
## [1] "mean_zscore" "sds_zscore"  "sn_zscore"   "prefiltered"
## [1] "list"
list_stats(angl)$mean_zscore[1:10, 1:6]
##              [,1]       [,2]       [,3]        [,4]        [,5]       [,6]
##  [1,]  0.00000000 -1.4773877 -0.6690115  0.80957395  0.21191114  0.3065459
##  [2,] -1.43297508  0.0000000  0.8094372  1.79591873  0.87491750  0.0903189
##  [3,] -0.70975605  0.8090174  0.0000000 -0.90002758 -0.14215190  0.4217798
##  [4,]  0.81025807  1.7916458 -0.8103351  0.00000000 -0.29386285 -0.2877692
##  [5,]  0.26228504  0.8796571 -0.2190412 -0.31429554  0.00000000 -0.7944232
##  [6,]  0.34210706  0.1727196  0.4235970 -0.32724467 -0.79625202  0.0000000
##  [7,]  0.60813294  0.4576137 -1.4403958 -0.03807431  0.35694728  0.5394522
##  [8,] -0.03868058 -1.0958225 -0.7680151 -0.62207485 -1.42724690 -0.8197054
##  [9,]  1.26715940  0.3486664 -0.5030029 -0.76152650 -0.66668260 -0.4825756
## [10,]  0.49115807  1.1467226  0.7976727 -0.27013900  0.03028304  0.8112288
list_stats(angl)$sn_zscore[1:10, 1:6]
##            [,1]       [,2]      [,3]      [,4]        [,5]      [,6]
##  [1,]        NA 3.87928890 3.7213414 1.5550667  0.23116551 1.1093569
##  [2,] 3.3234406         NA 2.6894229 1.7746389  1.81948152 0.0475004
##  [3,] 3.3486305 2.12554305        NA 0.5347330  0.07006687 0.6157208
##  [4,] 1.7557127 1.90940919 0.5017815        NA  2.67420836 0.7127632
##  [5,] 0.2814584 2.06177351 0.1102260 2.7147101          NA 1.0660911
##  [6,] 1.2925884 0.08738832 0.6689559 0.6989744  1.12999842        NA
##  [7,] 0.4210103 0.57464016 3.6559091 0.6177018 20.23533063 0.4351825
##  [8,] 0.1972052 2.70836022 1.0290168 1.1901274  1.60093202 0.7769169
##  [9,] 1.1991785 0.85301337 2.6746701 1.1487110  1.87190277 0.9452842
## [10,] 0.8240564 0.54672981 0.7297777 0.2970755  0.04424840 2.7762187

select_genes

  • under the hood, anglemania calls select_genes on the anglemania object with the default thresholds zscore_mean_threshold = 2.5, zscore_sn_threshold = 2.5
  • we can use select_genes to change the thresholds without having to run anglemania again
previous_genes <- get_anglemania_genes(angl)
angl <- select_genes(angl,
                     zscore_mean_threshold = 2,
                     zscore_sn_threshold = 2)
## [1] "Selected 194 genes for integration."
# Inspect the anglemania genes
new_genes <- get_anglemania_genes(angl)

length(previous_genes)
## [1] 25
length(new_genes)
## [1] 194