scRNA-Analysis-Module Pipeline Specification
Pipeline Details
- Name:
scRNA-Analysis-Module - Pipeline UUID:
bdebirwq12tkuv9e2hvy022qtw0la4 - Version:
2.0.2 - View Pipeline:
Overview
scRNA-Analysis-Module pipeline is designed for analyzing single cell RNA-Seq data. It takes in one or more h5 files and performs doublet-removal, filtering, normalization, batch effect correction, dimension reduction and clustering to ensure reliable and reproducible results.
Key Use cases:
- Single-cell RNA-seq Quality Control: Comprehensive filtering and quality assessment of single-cell data including mitochondrial and ribosomal gene content analysis.
- Batch Effect Correction: Integration of multiple samples with Harmony-based batch effect correction for multi-sample experiments.
- Cell Clustering and Marker Discovery: Automated clustering with resolution optimization and identification of cluster-specific gene markers.
Features
- Multiple Data Format Support: Accepts h5 files (HDF5 Feature-Barcode Matrix Format) and generates outputs in multiple formats including RDS, h5ad, and loom files.
- Comprehensive Quality Control: Implements filtering based on UMI counts, gene counts, mitochondrial content, and ribosomal content with customizable thresholds.
- Doublet Detection and Removal: Integrated DoubletFinder algorithm for identifying and removing potential doublets from single-cell data.
- Flexible Normalization Options: Supports multiple normalization methods including LogNormalize, CLR, RC, and SCT (sctransform).
- Advanced Dimensionality Reduction: Performs PCA with up to 100 components, followed by tSNE and UMAP visualization.
- Batch Effect Correction: Utilizes Harmony algorithm for correcting batch effects in multi-sample datasets.
- Interactive Visualization: Generates iSEE browser sessions for interactive exploration of single-cell data.
- Automated Clustering: Resolution optimization with clustree visualization and automated marker gene identification.
- Multi-format Output: Exports results to Seurat RDS, SingleCellExperiment, h5ad (for Python/scanpy), and loom formats.
Input/Output Specification
Inputs
Required
h5 file
- Description: HDF5 Feature-Barcode Matrix Format file containing single-cell RNA-seq count data (typically from 10X Genomics Cell Ranger output).
- Format: .h5
- Example File Path: /path/to/input/filtered_feature_bc_matrix.h5
Optional Inputs
Metadata TSV
- Description: A structured metadata file containing sample information for multi-sample analysis.
- Format: Tab-separated values (.tsv)
- Required Columns: Sample information including batch identifiers for correction
- Example File Path: /path/to/metadata/sample_metadata.tsv
Outputs
Reported Outputs
- Clustering Report HTML:
- Description: Comprehensive HTML report containing clustering analysis, sample statistics, PCA/tSNE/UMAP visualizations, and cluster marker identification
- Format: .html
- Example File Path: /output/clustering_report.html
- Visualization App: Web browser
-
Location: Report folder
-
Quality Control Report HTML:
- Description: Quality control report showing filtering statistics, cell and gene metrics, and mitochondrial/ribosomal content analysis
- Format: .html
- Example File Path: /output/qc_report.html
- Visualization App: Web browser
- Location: Report folder
Supporting Outputs
- Final Seurat Object:
- Description: Processed Seurat object containing normalized data, dimensionality reduction results, and clustering information
- Format: .rds
-
Example File Path: /output/final_seurat_object.rds
-
Cluster Markers TSV:
- Description: Tab-separated file containing identified marker genes for each cluster with statistical significance metrics
- Format: .tsv
-
Example File Path: /output/cluster_markers.tsv
-
h5ad File:
- Description: AnnData format file compatible with Python scanpy for further analysis
- Format: .h5ad
-
Example File Path: /output/processed_data.h5ad
-
SingleCellExperiment Object:
- Description: Converted SingleCellExperiment object for Bioconductor-based analysis
- Format: .rds
- Example File Path: /output/sce_object.rds
Associated Processes
- Clustering and Find Markers
- Create h5ad
- launch isee copy
- Load Data and QC h5
- Merge Seurat Objects
- PCA and Batch Effect Correction
- SCEtoLOOM
- seurat to sce copy
References & Additional Documentation
- Related Papers/links:
- Seurat: https://satijalab.org/seurat/index.html
- DoubletFinder: https://github.com/chris-mcginnis-ucsf/DoubletFinder
- Workflow Diagram: Available in pipeline description pages