Skip to content

PCA and Batch Effect Correction Specifications

Process Details

  • Name: PCA and Batch Effect Correction
  • Process UUID: T9tSHa58dJ4m2wJqIr9p3ymPwfhk6p
  • Process Group: SingleCell

Overview

This process performs Principal Component Analysis (PCA) and optional batch effect correction on single-cell RNA-seq data using the Seurat package. The process handles both single and multi-sample datasets, automatically detecting the data structure and applying appropriate preprocessing steps including variable feature selection, data scaling, and dimensionality reduction. When batch effect correction is enabled, the process uses the Harmony algorithm to integrate samples and remove technical variation while preserving biological signal.

This process is implemented in Bash, which invokes an R script for single-cell data analysis using Seurat and Harmony packages.

Key Functionality

  • Variable Feature Selection: Identifies highly variable genes using configurable selection methods and feature counts
  • Data Scaling and PCA: Performs data scaling with optional mitochondrial gene regression and computes 100 principal components
  • Batch Effect Correction: Applies Harmony algorithm to integrate multiple samples and correct for batch effects
  • Multi-modal Support: Handles weighted nearest network analysis for multi-omics datasets when specified

Input/Output Specification

Inputs

Required Inputs

  • Seurat Object
    • Description: Single-cell RNA-seq data in Seurat format, either as single sample or list of multiple samples
    • Format: RDS

Outputs

  • Processed Seurat Object
    • Description: Seurat object with PCA reduction and optional batch effect correction applied
    • Format: RDS

Parameters & Settings

These parameters can be adjusted in the Foundry UI when running this process.

  • # of Variable Features

    • Description: Use this many features as variable features after ranking by residual variance; default is 3000
    • Default value: 3000
  • Selection Method

    • Description: Method to choose top variable features
    • Available options: vst (default), mean.var.plot, dispersion
  • Correct Batch Effect

    • Description: Choose whether to do batch effect correction. Default is TRUE
    • Available options: TRUE (default), FALSE
  • Weighted Nearest Network assay

    • Description: If the data is multi-modal/omics, it is possible to leverage more than one assay in the downstream analysis
    • Default value: (empty)

References & Resources

  • Tool Documentation: Contact the team for details on the R script implementation
  • Related Papers: Stuart, T., et al. (2019). Comprehensive Integration of Single-Cell Data. Cell, 177(7), 1888-1902. DOI: 10.1016/j.cell.2019.05.031