Select genes from a SingleCellExperiment object based on mean z-score and the signal-to-noise ratio of angles between gene pairs across batches.
prefilter_angl(
snr_zscore_matrix,
mean_zscore_matrix,
zscore_mean_threshold = 1,
zscore_sn_threshold = 1,
verbose = TRUE
)
select_genes(
sce,
zscore_mean_threshold = 2,
zscore_sn_threshold = 2,
max_n_genes = NULL,
direction = "both",
adjust_thresholds = TRUE,
verbose = TRUE
)
extract_rows_for_unique_genes(dt, max_n_genes)
A bigstatsr::FBM
object containing the SNR
z-scores.
A bigstatsr::FBM
object containing the mean
z-scores.
Numeric value specifying the threshold for the absolute mean z-score. Default is 2.
Numeric value specifying the threshold for the SNR z-score. Default is 2.
A SingleCellExperiment
object.
An integer specifying the maximum number of unique genes to return.
whether to select genes with positive, negative or both mean z-scores. Default is "both"
whether to automatically adjust threholds if the selected genes do not meet the thresholds. Default is TRUE
A data frame containing gene pairs, with columns geneA
and geneB
.
A data frame containing the prefiltered gene pairs.
The input anglemaniaObject
with the
integration_genes
slot updated to include the selected genes and
their statistical information.
A vector of unique gene identifiers.
The function performs the following steps:
Identifies gene pairs where both the mean z-score and SNR z-score exceed the specified thresholds.
The function performs the following steps:
If max_n_genes
is not specified, it uses all genes that pass
the thresholds.
Identifies gene pairs where both the mean z-score and SNR z-score exceed the specified thresholds.
If no gene pairs meet the criteria, it adjusts the thresholds to the 99th percentile values of the corresponding statistics and re-selects.
Extracts unique genes from the selected gene pairs using
extract_rows_for_unique_genes
.
The function combines the geneA
and geneB
columns, extracts
unique gene names, and returns the first max_n_genes
genes. If
max_n_genes
exceeds the number of unique genes available, all unique
genes are returned.
prefilter_angl()
: Prefilter gene pairs from the mean and SNR z-scores
based on thresholds, to simplify downstream filtering.
select_genes()
: Select genes from the mean and SNR z-score matrices
stored in the SCE based on thresholds for mean and SNR.
extract_rows_for_unique_genes()
: Extract unique gene identifiers
from gene pairs, returning up to a specified maximum number.
extract_rows_for_unique_genes
,
get_intersect_genes
, get_list_stats
select_genes
library(SingleCellExperiment)
sce <- sce_example()
sce <- anglemania(sce, batch_key = "batch")
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#>
| | 0 % elapsed=00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e3121aa25" which didn't exist..
#>
|========================= | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e16704157" which didn't exist..
#>
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 25 genes for integration.
snr_zscore_matrix <- metadata(sce)$anglemania$list_stats$sn_zscore
mean_zscore_matrix <- metadata(sce)$anglemania$list_stats$mean_zscore
prefiltered_df <- prefilter_angl(
snr_zscore_matrix,
mean_zscore_matrix,
zscore_mean_threshold = 1,
zscore_sn_threshold = 1
)
head(prefiltered_df)
#> geneA geneB sn_zscore mean_zscore
#> 1 1 2 3.879289 -1.477388
#> 2 1 9 1.229311 1.324456
#> 3 1 11 17.398188 -1.505467
#> 4 1 19 10.337061 -1.585920
#> 5 1 26 2.618062 1.326246
#> 6 1 33 8.079134 -1.560315
sce <- sce_example()
sce <- anglemania(
sce,
batch_key = "batch",
zscore_mean_threshold = 2.5,
zscore_sn_threshold = 2.5
)
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#>
| | 0 % elapsed=00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e51b95ac4" which didn't exist..
#>
|========================= | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e64844fb7" which didn't exist..
#>
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 25 genes for integration.
anglemania_genes <- get_anglemania_genes(sce)
# View the selected genes and use for integration
head(anglemania_genes)
#> [1] "gene43" "gene207" "gene119" "gene130" "gene25" "gene209"
length(anglemania_genes)
#> [1] 25
# Adjust thresholds to 2 and select genes
sce <- select_genes(
sce,
zscore_mean_threshold = 2,
zscore_sn_threshold = 2
)
#> Selected 194 genes for integration.
anglemania_genes <- get_anglemania_genes(sce)
head(anglemania_genes)
#> [1] "gene43" "gene207" "gene119" "gene130" "gene25" "gene209"
length(anglemania_genes)
#> [1] 194
gene_pairs <- data.frame(
geneA = c("Gene1", "Gene2", "Gene3", "Gene4"),
geneB = c("Gene3", "Gene4", "Gene5", "Gene6")
)
unique_genes <- extract_rows_for_unique_genes(
gene_pairs,
max_n_genes = 3
)
print(unique_genes)
#> [1] "Gene1" "Gene3" "Gene2"