Select genes from a SingleCellExperiment object based on mean z-score and the signal-to-noise ratio of angles between gene pairs across batches.

prefilter_angl(
  snr_zscore_matrix,
  mean_zscore_matrix,
  zscore_mean_threshold = 1,
  zscore_sn_threshold = 1,
  verbose = TRUE
)

select_genes(
  sce,
  zscore_mean_threshold = 2,
  zscore_sn_threshold = 2,
  max_n_genes = NULL,
  direction = "both",
  adjust_thresholds = TRUE,
  verbose = TRUE
)

extract_rows_for_unique_genes(dt, max_n_genes)

Arguments

snr_zscore_matrix

A bigstatsr::FBM object containing the SNR z-scores.

mean_zscore_matrix

A bigstatsr::FBM object containing the mean z-scores.

zscore_mean_threshold

Numeric value specifying the threshold for the absolute mean z-score. Default is 2.

zscore_sn_threshold

Numeric value specifying the threshold for the SNR z-score. Default is 2.

sce

A SingleCellExperiment object.

max_n_genes

An integer specifying the maximum number of unique genes to return.

direction

whether to select genes with positive, negative or both mean z-scores. Default is "both"

adjust_thresholds

whether to automatically adjust threholds if the selected genes do not meet the thresholds. Default is TRUE

dt

A data frame containing gene pairs, with columns geneA and geneB.

Value

A data frame containing the prefiltered gene pairs.

The input anglemaniaObject with the integration_genes slot updated to include the selected genes and their statistical information.

A vector of unique gene identifiers.

Details

The function performs the following steps:

  1. Identifies gene pairs where both the mean z-score and SNR z-score exceed the specified thresholds.

The function performs the following steps:

  1. If max_n_genes is not specified, it uses all genes that pass the thresholds.

  2. Identifies gene pairs where both the mean z-score and SNR z-score exceed the specified thresholds.

  3. If no gene pairs meet the criteria, it adjusts the thresholds to the 99th percentile values of the corresponding statistics and re-selects.

  4. Extracts unique genes from the selected gene pairs using extract_rows_for_unique_genes.

The function combines the geneA and geneB columns, extracts unique gene names, and returns the first max_n_genes genes. If max_n_genes exceeds the number of unique genes available, all unique genes are returned.

Functions

  • prefilter_angl(): Prefilter gene pairs from the mean and SNR z-scores based on thresholds, to simplify downstream filtering.

  • select_genes(): Select genes from the mean and SNR z-score matrices stored in the SCE based on thresholds for mean and SNR.

  • extract_rows_for_unique_genes(): Extract unique gene identifiers from gene pairs, returning up to a specified maximum number.

See also

extract_rows_for_unique_genes, get_intersect_genes, get_list_stats

select_genes

Examples

library(SingleCellExperiment)
sce <- sce_example()
sce <- anglemania(sce, batch_key = "batch")
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#> 
  |                                                  | 0 % elapsed=00s   
#> Creating directory "/tmp/RtmpwM6D04/file1f8e3121aa25" which didn't exist..
#> 
  |=========================                         | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e16704157" which didn't exist..
#> 
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 25 genes for integration.
snr_zscore_matrix <- metadata(sce)$anglemania$list_stats$sn_zscore
mean_zscore_matrix <- metadata(sce)$anglemania$list_stats$mean_zscore
prefiltered_df <- prefilter_angl(
  snr_zscore_matrix,
  mean_zscore_matrix,
  zscore_mean_threshold = 1,
  zscore_sn_threshold = 1
)
head(prefiltered_df)
#>   geneA geneB sn_zscore mean_zscore
#> 1     1     2  3.879289   -1.477388
#> 2     1     9  1.229311    1.324456
#> 3     1    11 17.398188   -1.505467
#> 4     1    19 10.337061   -1.585920
#> 5     1    26  2.618062    1.326246
#> 6     1    33  8.079134   -1.560315
sce <- sce_example()
sce <- anglemania(
  sce,
  batch_key = "batch",
  zscore_mean_threshold = 2.5,
  zscore_sn_threshold = 2.5
)
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: batch
#> Preparing input...
#> Filtering each batch to at least 1 cells per gene...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 300
#> Extracting count matrices...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing angles and transforming to z-scores...
#> 
  |                                                  | 0 % elapsed=00s   
#> Creating directory "/tmp/RtmpwM6D04/file1f8e51b95ac4" which didn't exist..
#> 
  |=========================                         | 50% elapsed=00s, remaining~00s
#> Creating directory "/tmp/RtmpwM6D04/file1f8e64844fb7" which didn't exist..
#> 
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Pre-filtering features...
#> Extracting filtered features...
#> Selected 25 genes for integration.
anglemania_genes <- get_anglemania_genes(sce)
# View the selected genes and use for integration
head(anglemania_genes)
#> [1] "gene43"  "gene207" "gene119" "gene130" "gene25"  "gene209"
length(anglemania_genes)
#> [1] 25
# Adjust thresholds to 2 and select genes
sce <- select_genes(
  sce,
  zscore_mean_threshold = 2,
  zscore_sn_threshold = 2
)
#> Selected 194 genes for integration.
anglemania_genes <- get_anglemania_genes(sce)
head(anglemania_genes)
#> [1] "gene43"  "gene207" "gene119" "gene130" "gene25"  "gene209"
length(anglemania_genes)
#> [1] 194
gene_pairs <- data.frame(
  geneA = c("Gene1", "Gene2", "Gene3", "Gene4"),
  geneB = c("Gene3", "Gene4", "Gene5", "Gene6")
)
unique_genes <- extract_rows_for_unique_genes(
  gene_pairs,
  max_n_genes = 3
)
print(unique_genes)
#> [1] "Gene1" "Gene3" "Gene2"