Select genes from a SingleCellExperiment object based on mean z-score and the signal-to-noise ratio of angles between gene pairs across batches.

prefilter_angl(
  sce,
  zscore_mean_threshold = 1,
  zscore_sn_threshold = 1,
  verbose = TRUE
)

select_genes(
  sce,
  max_n_genes = 2000,
  score_weights = c(0.4, 0.6),
  verbose = TRUE
)

extract_rows_for_unique_genes(dt, max_n_genes)

Arguments

sce

A SingleCellExperiment object.

zscore_mean_threshold

Numeric value specifying the threshold for the absolute mean z-score. Default is 1.

zscore_sn_threshold

Numeric value specifying the threshold for the SNR z-score. Default is 1.

verbose

Logical value indicating whether to print progress messages. Default is TRUE.

max_n_genes

An integer specifying the maximum number of unique genes to return.

score_weights

A vector of two numeric values specifying the weights for the mean z-score and standard deviation of z-score, respectively. Default is c(0.4, 0.6) for a greater emphasis on the standard deviation of z-score.

dt

A data frame containing gene pairs, with columns geneA and geneB.

Value

A data frame containing the prefiltered gene pairs.

The input SingleCellExperiment object with the anglemania_genes slot updated to include the selected genes and their statistical information.

A vector of unique gene identifiers.

Details

The function performs the following steps:

  1. Identifies gene pairs where both the mean z-score and SNR z-score exceed the specified thresholds.

Selects the top n genes based on the weighted sum of the ranked mean and standard deviation of the z-score of the correlations between gene pairs.

The function combines the geneA and geneB columns, extracts unique gene names, and returns the first max_n_genes genes. If max_n_genes exceeds the number of unique genes available, all unique genes are returned.

Functions

  • prefilter_angl(): Prefilter gene pairs from the mean and SNR z-scores based on thresholds, to simplify downstream filtering.

  • select_genes(): Select the top n genes on the weighted sum of the ranks of the mean z-score and SNR z-score of the gene pairs.

  • extract_rows_for_unique_genes(): Extract unique gene identifiers from gene pairs, returning up to a specified maximum number.

See also

extract_rows_for_unique_genes, get_intersect_genes, get_list_stats

select_genes

Examples

library(SingleCellExperiment)
sce <- sce_example()
sce <- anglemania(sce, batch_key = "batch")
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#> 
  |                                                  | 0 % elapsed=00s   
#> Creating directory "/tmp/RtmpqekGwf/file2fd66666895e" which didn't exist..
#> 
  |=========================                         | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpqekGwf/file2fd626050bc8" which didn't exist..
#> 
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> 2000 is larger than the number of intersected genes.Setting max_n_genes to 300
#> Selected 300 genes for integration.
prefiltered_df <- prefilter_angl(
  sce,
  zscore_mean_threshold = 1,
  zscore_sn_threshold = 1
)
head(prefiltered_df)
#> class: SingleCellExperiment 
#> dim: 6 600 
#> metadata(1): anglemania
#> assays(1): counts
#> rownames(6): gene1 gene2 ... gene5 gene6
#> rowData names(1): anglemania_genes
#> colnames(600): cell1 cell2 ... cell599 cell600
#> colData names(3): batch dataset anglemania_batch
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
sce <- sce_example()
sce <- anglemania(
    sce,
    batch_key = "batch",
    max_n_genes = 20
)
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#> 
  |                                                  | 0 % elapsed=00s   
#> Creating directory "/tmp/RtmpqekGwf/file2fd67348c93a" which didn't exist..
#> 
  |=========================                         | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpqekGwf/file2fd652bbe2ff" which didn't exist..
#> 
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 20 genes for integration.
anglemania_genes <- get_anglemania_genes(sce)
# View the selected genes and use for integration
head(anglemania_genes)
#> [1] "gene270" "gene294" "gene66"  "gene122" "gene18"  "gene161"
length(anglemania_genes)
#> [1] 20
sce <- select_genes(
    sce,
    max_n_genes = 10
)
#> Selected 10 genes for integration.
anglemania_genes <- get_anglemania_genes(sce)
head(anglemania_genes)
#> [1] "gene270" "gene294" "gene66"  "gene122" "gene18"  "gene161"
length(anglemania_genes)
#> [1] 10
gene_pairs <- data.frame(
  geneA = c("Gene1", "Gene2", "Gene3", "Gene4"),
  geneB = c("Gene3", "Gene4", "Gene5", "Gene6")
)
unique_genes <- extract_rows_for_unique_genes(
  gene_pairs,
  max_n_genes = 3
)
print(unique_genes)
#> [1] "Gene1" "Gene3" "Gene2"