Skip to contents

integrate_by_features integrates samples or batches within a Seurat object using canonical correlation analysis (CCA) based on a set of selected features (genes). The function utilizes an anglemania_object to extract anglemania genes and handles the integration process, including optional downstream processing steps such as scaling, PCA, and UMAP visualization.

Usage

integrate_by_features(
  seurat_object,
  angl,
  int_order = NULL,
  process = TRUE,
  verbose = FALSE
)

Arguments

seurat_object

A Seurat object containing all samples or batches to be integrated.

angl

An anglemania_object-class previously generated using create_anglemania_object and anglemania. It is important that the dataset_key and batch_key are correctly set in the anglemania_object.

int_order

An optional data frame specifying the integration order of samples within the Seurat list. See the sample.tree argument in IntegrateData for more details. If not provided, Seurat will construct the integration order using hierarchical clustering. Default is NULL.

process

Logical value indicating whether to further process the data after integration (i.e., scale it, run PCA, and compute UMAP embeddings). Default is TRUE.

verbose

Logical value indicating whether to display progress messages during integration. Default is FALSE.

Value

A Seurat object containing the integrated data. The default assay is set to "integrated".

Details

The function performs the following steps:

  1. Batch Key Addition: Adds a unique batch key to the Seurat object's metadata to distinguish different batches or samples. Batch key is set to the anglemania_object's batch_key.

  2. Splitting: Splits the Seurat object into a list of Seurat objects based on the batch key.

  3. Integration: Calls integrate_seurat_list to integrate the list of Seurat objects using the features extracted from the anglemania_object.

The integration is performed using Seurat's CCA-based methods, and parameters are adjusted based on the smallest dataset to ensure compatibility with small sample sizes (e.g., metacells or SEACells). If process = TRUE, the function will also scale the data, run PCA, and compute UMAP embeddings.

Examples

# Integrate samples using anglemania_object 
# Automatically reads the batch key from anglemania_object
#  splits the seurat object into batches and integrates them
#  using CCA integration and anglemania genes previously extracted
#  with anglemania() or select_genes()
se <- SeuratObject::pbmc_small
angl <- create_anglemania_object(se, batch_key = "groups")
#> No dataset_key specified.
#> Assuming that all samples belong to the same dataset and are separated by batch_key: groups
#> Extracting count matrices...
#> Filtering each batch to at least 1 cells per gene...
#> Using the intersection of filtered genes from all batches...
#> Number of genes in intersected set: 228
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=00s, remaining~00s
angl <- anglemania(angl)
#> Computing angles and transforming to z-scores...
#> 
  |                                                  | 0 % elapsed=00s   
  |=========================                         | 50% elapsed=00s, remaining~00s
  |==================================================| 100% elapsed=00s, remaining~00s
#> Computing statistics...
#> Weighting matrix_list...
#> Calculating mean...
#> Calculating sds...
#> Filtering features...
options(future.globals.maxSize = 4000 * 1024^2)
integrated_object <- integrate_by_features(se, angl)
#> Log normalizing data...
#> 
  |                                                  | 0 % elapsed=00s   
  |=========================                         | 50% elapsed=00s, remaining~00s
  |==================================================| 100% elapsed=00s, remaining~00s
#> Finding integration anchors...
#> 
  |                                                  | 0 % elapsed=00s   
  |==================================================| 100% elapsed=01s, remaining~00s
#> Integrating samples...
#> Warning: Layer counts isn't present in the assay object; returning NULL
#> Running PCA with 30 PCs
#> Running UMAP with 30 PCs and 10 neighbors
#> Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
#> To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
#> This message will be shown once per session