Skip to content

scRNA-Analysis-Module Pipeline Specification

Pipeline Details

  • Name: scRNA-Analysis-Module
  • Pipeline UUID: bdebirwq12tkuv9e2hvy022qtw0la4
  • Version: 2.0.2
  • View Pipeline:

Overview

scRNA-Analysis-Module pipeline is designed for analyzing single cell RNA-Seq data. It takes in one or more h5 files and performs doublet-removal, filtering, normalization, batch effect correction, dimension reduction and clustering to ensure reliable and reproducible results.

Key Use cases:

  • Single-cell RNA-seq Quality Control: Comprehensive filtering and quality assessment of single-cell data including mitochondrial and ribosomal gene content analysis.
  • Batch Effect Correction: Integration of multiple samples with Harmony-based batch effect correction for multi-sample experiments.
  • Cell Clustering and Marker Discovery: Automated clustering with resolution optimization and identification of cluster-specific gene markers.

Features

  • Multiple Data Format Support: Accepts h5 files (HDF5 Feature-Barcode Matrix Format) and generates outputs in multiple formats including RDS, h5ad, and loom files.
  • Comprehensive Quality Control: Implements filtering based on UMI counts, gene counts, mitochondrial content, and ribosomal content with customizable thresholds.
  • Doublet Detection and Removal: Integrated DoubletFinder algorithm for identifying and removing potential doublets from single-cell data.
  • Flexible Normalization Options: Supports multiple normalization methods including LogNormalize, CLR, RC, and SCT (sctransform).
  • Advanced Dimensionality Reduction: Performs PCA with up to 100 components, followed by tSNE and UMAP visualization.
  • Batch Effect Correction: Utilizes Harmony algorithm for correcting batch effects in multi-sample datasets.
  • Interactive Visualization: Generates iSEE browser sessions for interactive exploration of single-cell data.
  • Automated Clustering: Resolution optimization with clustree visualization and automated marker gene identification.
  • Multi-format Output: Exports results to Seurat RDS, SingleCellExperiment, h5ad (for Python/scanpy), and loom formats.

Input/Output Specification

Inputs

Required

h5 file

  • Description: HDF5 Feature-Barcode Matrix Format file containing single-cell RNA-seq count data (typically from 10X Genomics Cell Ranger output).
  • Format: .h5
  • Example File Path: /path/to/input/filtered_feature_bc_matrix.h5

Optional Inputs

Metadata TSV

  • Description: A structured metadata file containing sample information for multi-sample analysis.
  • Format: Tab-separated values (.tsv)
  • Required Columns: Sample information including batch identifiers for correction
  • Example File Path: /path/to/metadata/sample_metadata.tsv

Outputs

Reported Outputs

  • Clustering Report HTML:
  • Description: Comprehensive HTML report containing clustering analysis, sample statistics, PCA/tSNE/UMAP visualizations, and cluster marker identification
  • Format: .html
  • Example File Path: /output/clustering_report.html
  • Visualization App: Web browser
  • Location: Report folder

  • Quality Control Report HTML:

  • Description: Quality control report showing filtering statistics, cell and gene metrics, and mitochondrial/ribosomal content analysis
  • Format: .html
  • Example File Path: /output/qc_report.html
  • Visualization App: Web browser
  • Location: Report folder

Supporting Outputs

  • Final Seurat Object:
  • Description: Processed Seurat object containing normalized data, dimensionality reduction results, and clustering information
  • Format: .rds
  • Example File Path: /output/final_seurat_object.rds

  • Cluster Markers TSV:

  • Description: Tab-separated file containing identified marker genes for each cluster with statistical significance metrics
  • Format: .tsv
  • Example File Path: /output/cluster_markers.tsv

  • h5ad File:

  • Description: AnnData format file compatible with Python scanpy for further analysis
  • Format: .h5ad
  • Example File Path: /output/processed_data.h5ad

  • SingleCellExperiment Object:

  • Description: Converted SingleCellExperiment object for Bioconductor-based analysis
  • Format: .rds
  • Example File Path: /output/sce_object.rds

Associated Processes

References & Additional Documentation

  • Related Papers/links:
  • Seurat: https://satijalab.org/seurat/index.html
  • DoubletFinder: https://github.com/chris-mcginnis-ucsf/DoubletFinder
  • Workflow Diagram: Available in pipeline description pages