factorise
computes the angle matrix of the input gene expression
matrix using the specified method, performs permutation to create a null
distribution, and transforms the correlations into z-scores. This function
is optimized for large datasets using the bigstatsr package.
factorise(
x_mat,
method = "cosine",
seed = 1,
permute_row_or_column = "column",
permutation_function = "sample",
normalization_method = "divide_by_total_counts"
)
A FBM
object representing the
normalized and scaled gene expression matrix.
A character string specifying the method for calculating the
relationship between gene pairs. Default is "cosine"
. Other options
include "spearman"
An integer value for setting the seed for reproducibility during
permutation. Default is 1
.
Character "row" or "column", whether permutations should be executed row-wise or column wise. Default is "column"
Character "sample" or "permute_nonzero". If sample, then sample is used for constructing background distributions. If permute_nonzero, then only non-zero values are permuted. Default is "sample"
Character "divide_by_total_counts" or
"scale_by_total_counts". Default is "divide_by_total_counts"
An FBM
object containing the
z-score-transformed angle matrix.
The function performs the following steps:
Permutation: The input matrix is permuted column-wise to disrupt existing angles, creating a null distribution.
Angle Computation: Computes the angle matrix for both the
original and permuted matrices using extract_angles
.
Method-Specific Processing:
For other methods ("cosine"
, "spearman"
),
statistical measures are computed from the permuted data.
Statistical Measures: Calculates mean, variance, and standard
deviation using get_dstat
.
Z-Score Transformation: Transforms the original angle matrix into z-scores.
This process allows for the identification of invariant gene-gene relationships by comparing them to a null distribution derived from the permuted data.
mat <- matrix(
c(
5, 3, 0, 0,
0, 0, 0, 3,
2, 1, 3, 4,
0, 0, 1, 0,
1, 2, 1, 2,
3, 4, 3, 4
),
nrow = 6, # 6 genes
ncol = 4, # 4 cells
byrow = TRUE
)
mat <- bigstatsr::FBM(nrow = nrow(mat), ncol = ncol(mat), init = mat)
# Run factorise with method "cosine" and a fixed seed
result_fbm <- factorise(mat, method = "cosine", seed = 1)
#> Creating directory "/tmp/Rtmpt97azZ/file1d43113e744f" which didn't exist..
result_fbm[]
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0.0000000 -0.511657999 -1.026193139 -0.5116580 0.3120330 0.2450271
#> [2,] -0.6935391 0.000000000 0.954089829 -0.2862776 0.6648609 -0.2644019
#> [3,] -1.5135484 0.732358147 0.000000000 1.1189324 -0.6519721 -0.3581465
#> [4,] -0.4346556 0.001120625 1.726785664 0.0000000 0.2759029 1.3592689
#> [5,] -0.1043352 0.386696746 -0.575948146 -0.2533670 0.0000000 1.1680039
#> [6,] 0.2679001 -0.262671929 0.007957691 1.7464077 2.5990906 0.0000000