Skip to contents

factorise computes the angle matrix of the input gene expression matrix using the specified method, performs permutation to create a null distribution, and transforms the correlations into z-scores. This function is optimized for large datasets using the bigstatsr package.

Usage

factorise(x_mat, method = "pearson", seed = 1)

Arguments

x_mat

A FBM object representing the normalized and scaled gene expression matrix.

method

A character string specifying the method for calculating the relationship between gene pairs. Default is "pearson". Other options include "spearman" and "diem" (see https://bytez.com/docs/arxiv/2407.08623/paper).

seed

An integer value for setting the seed for reproducibility during permutation. Default is 1.

Value

An FBM object containing the z-score-transformed angle matrix.

Details

The function performs the following steps:

  1. Permutation: The input matrix is permuted column-wise to disrupt existing angles, creating a null distribution.

  2. Angle Computation: Computes the angle matrix for both the original and permuted matrices using extract_angles.

  3. Method-Specific Processing:

    • If method = "diem", computes Euclidean distances and scales the angles accordingly, based on the methodology from the DIEM algorithm (https://bytez.com/docs/arxiv/2407.08623/paper).

    • For other methods ("pearson", "spearman"), statistical measures are computed from the permuted data.

  4. Statistical Measures: Calculates mean, variance, and standard deviation using get_dstat.

  5. Z-Score Transformation: Transforms the original angle matrix into z-scores.

This process allows for the identification of invariant gene-gene relationships by comparing them to a null distribution derived from the permuted data.

Examples

mat <- matrix(
 c(
     5, 3, 0, 0,
     0, 0, 0, 3,
     2, 1, 3, 4,
     0, 0, 1, 0,
     1, 2, 1, 2,
     3, 4, 3, 4
   ),
   nrow = 6, # 6 genes
   ncol = 4, # 4 cells
   byrow = TRUE
)

mat <- bigstatsr::FBM(nrow = nrow(mat), ncol = ncol(mat), init = mat)

# Run factorise with method "pearson" and a fixed seed
result_fbm <- factorise(mat, method = "pearson", seed = 1)
result_fbm[]
#>             [,1]       [,2]        [,3]        [,4]        [,5]        [,6]
#> [1,]          NA -0.7562162 -1.32900007 -0.75621619  0.16072196  0.08613056
#> [2,] -0.75621619         NA  1.07340109 -0.30397076  0.75222541 -0.27967877
#> [3,] -1.32900007  1.0734011          NA  1.48691183 -0.40738914 -0.09308989
#> [4,] -0.75621619 -0.3039708  1.48691183          NA -0.01880374  1.10550559
#> [5,]  0.16072196  0.7522254 -0.40738914 -0.01880374          NA  1.69339816
#> [6,]  0.08613056 -0.2796788 -0.09308989  1.10550559  1.69339816          NA