Select genes from a SingleCellExperiment object based on mean z-score and the signal-to-noise ratio of angles between gene pairs across batches.
prefilter_angl(
sce,
zscore_mean_threshold = 1,
zscore_sn_threshold = 1,
verbose = TRUE
)
select_genes(
sce,
max_n_genes = 2000,
score_weights = c(0.4, 0.6),
verbose = TRUE
)
extract_rows_for_unique_genes(dt, max_n_genes)
A SingleCellExperiment
object.
Numeric value specifying the threshold for the absolute mean z-score. Default is 1.
Numeric value specifying the threshold for the SNR z-score. Default is 1.
Logical value indicating whether to print progress messages.
Default is TRUE
.
An integer specifying the maximum number of unique genes to return.
A vector of two numeric values specifying the weights
for the mean z-score and standard deviation of z-score, respectively.
Default is c(0.4, 0.6)
for a greater emphasis on the standard
deviation of z-score.
A data frame containing gene pairs, with columns geneA
and geneB
.
A data frame containing the prefiltered gene pairs.
The input SingleCellExperiment
object with the
anglemania_genes
slot updated to include the selected genes and
their statistical information.
A vector of unique gene identifiers.
The function performs the following steps:
Identifies gene pairs where both the mean z-score and SNR z-score exceed the specified thresholds.
Selects the top n genes based on the weighted sum of the ranked mean and standard deviation of the z-score of the correlations between gene pairs.
The function combines the geneA
and geneB
columns, extracts
unique gene names, and returns the first max_n_genes
genes. If
max_n_genes
exceeds the number of unique genes available, all unique
genes are returned.
prefilter_angl()
: Prefilter gene pairs from the mean and SNR z-scores
based on thresholds, to simplify downstream filtering.
select_genes()
: Select the top n genes on the weighted sum
of the ranks of the mean z-score and SNR z-score of the gene pairs.
extract_rows_for_unique_genes()
: Extract unique gene identifiers
from gene pairs, returning up to a specified maximum number.
extract_rows_for_unique_genes
,
get_intersect_genes
, get_list_stats
select_genes
library(SingleCellExperiment)
sce <- sce_example()
sce <- anglemania(sce, batch_key = "batch")
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#>
| | 0 % elapsed=00s
#> Creating directory "/tmp/RtmpqekGwf/file2fd66666895e" which didn't exist..
#>
|========================= | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpqekGwf/file2fd626050bc8" which didn't exist..
#>
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> 2000 is larger than the number of intersected genes.Setting max_n_genes to 300
#> Selected 300 genes for integration.
prefiltered_df <- prefilter_angl(
sce,
zscore_mean_threshold = 1,
zscore_sn_threshold = 1
)
head(prefiltered_df)
#> class: SingleCellExperiment
#> dim: 6 600
#> metadata(1): anglemania
#> assays(1): counts
#> rownames(6): gene1 gene2 ... gene5 gene6
#> rowData names(1): anglemania_genes
#> colnames(600): cell1 cell2 ... cell599 cell600
#> colData names(3): batch dataset anglemania_batch
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
sce <- sce_example()
sce <- anglemania(
sce,
batch_key = "batch",
max_n_genes = 20
)
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#>
| | 0 % elapsed=00s
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#>
| | 0 % elapsed=00s
#> Creating directory "/tmp/RtmpqekGwf/file2fd67348c93a" which didn't exist..
#>
|========================= | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpqekGwf/file2fd652bbe2ff" which didn't exist..
#>
|==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 20 genes for integration.
anglemania_genes <- get_anglemania_genes(sce)
# View the selected genes and use for integration
head(anglemania_genes)
#> [1] "gene270" "gene294" "gene66" "gene122" "gene18" "gene161"
length(anglemania_genes)
#> [1] 20
sce <- select_genes(
sce,
max_n_genes = 10
)
#> Selected 10 genes for integration.
anglemania_genes <- get_anglemania_genes(sce)
head(anglemania_genes)
#> [1] "gene270" "gene294" "gene66" "gene122" "gene18" "gene161"
length(anglemania_genes)
#> [1] 10
gene_pairs <- data.frame(
geneA = c("Gene1", "Gene2", "Gene3", "Gene4"),
geneB = c("Gene3", "Gene4", "Gene5", "Gene6")
)
unique_genes <- extract_rows_for_unique_genes(
gene_pairs,
max_n_genes = 3
)
print(unique_genes)
#> [1] "Gene1" "Gene3" "Gene2"