Cell Ranger Count Pipeline Specification
Pipeline Details
- Name:
Cell Ranger Count Pipeline - Pipeline UUID:
c9205e92847b4658898d8585e4733953 - Version:
3.6.0 - View Pipeline:
Overview
Cell Ranger Count Pipeline pipeline is designed for analyzing Chromium single-cell RNA-seq data from 10x Genomics experiments. It automates the complete workflow from BCL file conversion to advanced single-cell analysis, including read alignment, barcode identification, UMI counting, clustering, and downstream visualization to ensure reliable and reproducible results.
Key Use cases:
- Single-cell Gene Expression Analysis: Process 3' Gene Expression libraries for comprehensive transcriptomic profiling at single-cell resolution.
- Feature Barcoding Integration: Analyze combined Gene Expression with Antibody Capture or CRISPR Guide Capture data for multi-modal single-cell experiments.
- BCL to FASTQ Conversion: Convert raw Illumina sequencing data (BCL files) to FASTQ format with proper sample demultiplexing.
Features
- Complete Cell Ranger v9.0.0 Integration: Utilizes the latest Cell Ranger suite for optimal single-cell data processing with improved algorithms and performance.
- Flexible Input Support: Accepts both FASTQ files and BCL directories with automatic sample sheet processing for streamlined workflow initiation.
- Custom Reference Genome Support: Enables addition of custom sequences to reference genomes with automated GTF generation and indexing.
- Advanced Quality Control: Implements comprehensive QC including ambient RNA removal using deconX, mitochondrial gene filtering, and multi-metric cell filtering.
- Multi-modal Analysis: Supports integration of Gene Expression with Feature Barcoding data (Antibody Capture, CRISPR screens).
- Batch Effect Correction: Incorporates Harmony algorithm for batch effect correction across multiple samples or experimental conditions.
- Comprehensive Clustering: Automated clustering with resolution optimization, marker gene identification, and visualization via UMAP/tSNE.
- Multiple Output Formats: Generates outputs in various formats including H5, H5AD, Loom, and RDS for compatibility with different analysis platforms.
- Gene Regulatory Network Analysis: Optional pySCENIC integration for transcription factor regulon analysis and gene regulatory network inference.
- Scalable Processing: Supports parallel processing of multiple samples with configurable computational resources.
Input/Output Specification
Inputs
Required
reads
- Description: FASTQ files containing raw single-cell RNA sequencing reads from 10x Genomics Chromium platform
- Format: .fastq.gz
- Example File Path: /path/to/sample_S1_L001_R1_001.fastq.gz, /path/to/sample_S1_L001_R2_001.fastq.gz
mate
- Description: Sequencing configuration specifying single-end or paired-end reads
- Format: String value ("single" or "pair")
Optional Inputs
BCL Directory
- Description: Illumina BCL directory containing raw sequencing data for conversion to FASTQ format
- Format: Directory structure with BCL files and RunInfo.xml
- Example File Path: /path/to/bcl_directory/
Sample Sheet CSV
- Description: Sample sheet file for BCL to FASTQ conversion specifying sample information and barcodes
- Required Columns: Sample_ID, Sample_Name, index, index2
- Format: Comma-separated values (.csv)
Custom Reference Sequences
- Description: Additional FASTA sequences to be added to the reference genome (e.g., transgenes, viral sequences)
- Format: .fasta or .fa
- Example File Path: /path/to/custom_sequences.fasta
Outputs
Reported Outputs
- Web Summary HTML:
- Description: Comprehensive quality control report with alignment metrics, cell detection statistics, and gene expression summary
- Format: .html
- Example File Path: /output/sample_web_summary.html
- Visualization App: Web browser
-
Location: Cell Ranger Count output folder
-
Feature-Barcode Matrix (H5):
- Description: Filtered gene expression count matrix in HDF5 format containing cell barcodes and gene counts
- Format: .h5
- Example File Path: /output/sample_filtered_feature_bc_matrix.h5
- Visualization App: Seurat, Scanpy, Cell Ranger Loupe Browser
-
Location: Cell Ranger Count output folder
-
Clustering Analysis Report:
- Description: HTML report containing PCA results, UMAP/tSNE visualizations, clustering analysis, and marker gene identification
- Format: .html
- Example File Path: /output/clustering_analysis_report.html
- Visualization App: Web browser
-
Location: Clustering output folder
-
Seurat Object:
- Description: Processed single-cell data object containing normalized expression data, dimensionality reduction, and clustering results
- Format: .rds
- Example File Path: /output/processed_seurat_object.rds
- Visualization App: R/Seurat
- Location: Analysis output folder
Supporting Outputs
- BAM Files:
- Description: Aligned reads in BAM format with cellular barcode and UMI information
- Format: .bam
-
Example File Path: /output/sample_possorted_genome_bam.bam
-
Raw Feature-Barcode Matrix:
- Description: Unfiltered gene expression count matrix including all detected barcodes
- Format: Directory with matrix.mtx, barcodes.tsv, features.tsv
-
Example File Path: /output/sample_raw_feature_bc_matrix/
-
H5AD File:
- Description: AnnData format file compatible with Scanpy and Python-based single-cell analysis tools
- Format: .h5ad
-
Example File Path: /output/sample_data.h5ad
-
Loom File:
- Description: Loom format file for visualization and analysis in tools like SCope and pySCENIC
- Format: .loom
- Example File Path: /output/sample_data.loom
Associated Processes
- Add custom seq to genome gtf
- Ambient RNA QC
- bclConvert
- Cell Ranger Count
- Cell Ranger mkref
- cellranger ref checker
- Check BED12
- check files
- Check Genome GTF
- Check chrom sizes and index
- Clustering and Find Markers
- convert gtf attributes
- Create h5ad
- demultiplexer prep
- filter summary
- Load Data and QC h5
- Merge Seurat Objects
- PCA and Batch Effect Correction
- prepare input velocyto
- process anndata
- pySCENIC GRN
- SCEtoLOOM
- velocyto
References & Additional Documentation
- 10x Genomics Cell Ranger Documentation: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/count
- Example Datasets: https://www.viafoundry.com/test_data/fastq_10x_pbmc_1k_v3/
- Supported Reference Genomes: human_hg39_gencode_v32_cellranger_v6 and other Cell Ranger compatible references