RNA-seq Pipeline Specification
Pipeline Details
- Name:
RNA-seq Pipeline - Pipeline UUID:
5ef44138e2c2418ebabbc8e2789671a2 - Version:
2.8.2 - View Pipeline:
Overview
RNA-seq Pipeline is designed for comprehensive RNA-sequencing data analysis, including quality control, rRNA filtering, genome alignment using HISAT2 and STAR, and estimating gene and isoform expression levels by RSEM, featureCounts and Salmon. Alternatively, Kallisto or Salmon can be used for quantifying abundances of transcripts based on pseudoalignments, without the need for alignment.
Key Use cases:
- Differential Gene Expression Analysis: Comprehensive RNA-seq data processing with DESeq2 and Limma Voom for identifying differentially expressed genes between conditions.
- Transcript Quantification: Accurate estimation of gene and isoform expression levels using multiple quantification methods including RSEM, Salmon, and Kallisto.
- Quality Control and Preprocessing: Automated quality assessment, read trimming, adapter removal, and rRNA filtering for reliable downstream analysis.
Features
- Multiple Alignment Options: Supports STAR and HISAT2 aligners for genome alignment, plus RSEM for transcriptome alignment.
- Flexible Quantification Methods: Includes RSEM, featureCounts, Salmon, and Kallisto for expression quantification with both alignment-based and pseudoalignment approaches.
- Comprehensive Quality Control: Implements FastQC, Picard, and RSeQC for thorough quality assessment and genome-wide BAM analysis.
- Differential Expression Analysis: Built-in DE module supporting both DESeq2 and Limma Voom with customizable statistical parameters and batch correction.
- rRNA and Contaminant Filtering: Uses Bowtie2/Bowtie/STAR to filter out common RNAs (rRNA, miRNA, tRNA, piRNA).
- Visualization Support: Generates IGV and Genome Browser files (TDF and BigWig) for interactive data exploration.
- UMI Support: Includes UMI extraction capabilities for single-cell and other UMI-based protocols.
- GSEA Integration: Performs Gene Set Enrichment Analysis on differential expression results.
- Scalable Processing: Can handle thousands of samples in parallel with containerized processes.
Input/Output Specification
Inputs
Required
Raw Sequencing Reads
- Description: FASTQ files containing raw RNA-seq reads from sequencing platforms
- Format: .fastq.gz (compressed FASTQ)
- Example File Path: /path/to/input/sample_R1.fastq.gz, /path/to/input/sample_R2.fastq.gz
Mate Information
- Description: Specifies whether reads are single-end or paired-end
- Format: String parameter ("single" or "pair")
Reference Genome Index
- Description: Pre-built genome indices for selected aligners (STAR, HISAT2)
- Format: Directory containing index files
Optional Inputs
Metadata File (Groups File)
- Description: Tab-separated file containing sample information for differential expression analysis
- Required Columns: sample_name, group
- Format: Tab-separated values (.tsv)
- Example:
sample_name group batch control_1 ctrl Day1 control_2 ctrl Day1 treat_1 treat Day1 treat_2 treat Day1
Comparison File
- Description: Specifies which groups to compare in differential expression analysis
- Required Columns: controls, treats, names
- Format: Tab-separated values (.tsv)
- Example:
controls treats names ctrl treat treat_v_ctrl
Custom Reference Sequences
- Description: Additional FASTA sequences to add to reference genome
- Format: .fasta
- Example File Path: /path/to/custom_sequences.fasta
Outputs
Reported Outputs
- Gene Expression Matrix:
- Description: Normalized gene expression counts suitable for downstream analysis
- Format: .tsv
- Example File Path: /output/gene_featureCounts.tsv
- Visualization App: DE Browser, R/Bioconductor
-
Location: Results Folder
-
Differential Expression Results:
- Description: Statistical results from DESeq2 or Limma Voom analysis with fold changes and p-values
- Format: .tsv
- Example File Path: /output/DE_reports/treat_v_ctrl_DESeq2.tsv
- Visualization App: DE Browser, IGV
-
Location: DE_reports Folder
-
Quality Control Reports:
- Description: Comprehensive QC metrics including FastQC, Picard, and MultiQC reports
- Format: .html, .pdf
- Example File Path: /output/multiqc_report.html
- Visualization App: Web browser
- Location: QC Folder
Supporting Outputs
- Aligned BAM Files:
- Description: Genome-aligned reads in BAM format for visualization and further analysis
- Format: .bam
- Example File Path: /intermediate/sample_aligned.bam
-
Visualization App: IGV, UCSC Genome Browser
-
Transcript Abundance Files:
- Description: Transcript-level expression estimates from Salmon/Kallisto
- Format: .tsv
-
Example File Path: /intermediate/salmon_quant/quant.sf
-
GSEA Results:
- Description: Gene Set Enrichment Analysis results and visualizations
- Format: .tsv, .html
- Example File Path: /output/GSEA_reports/
Associated Processes
- Add custom seq to genome gtf
- Adapter Removal
- Adapter Removal Summary
- bam sort index
- bamtofastq samtools fastq
- Check BED12
- Check Build Hisat2 Index
- Check Build Kallisto Index
- Check Build Rsem Index
- Check Build Salmon Index
- Check Build STAR Index
- check files
- Check Genome GTF
- Check chrom sizes and index
- check Hisat2 files
- check kallisto files
- check RSEM files
- check Salmon files
- Check Sequential Mapping Indexes
- check STAR files
- convert gtf attributes
- DE collector
- Deduplication Summary
- Download build sequential mapping indexes
- FastQC
- FastQC after Adapter Removal
- featureCounts
- featureCounts Prep
- featureCounts summary
- Genome Browser
- HISAT2 Summary
- IGV BAM2TDF converter
- Kallisto Alignment Summary
- Kallisto Summary
- Kallisto transcript to gene count
- kallisto quant
- Map HISAT2
- Map STAR
- Merge Bam
- Merge Bam and sort index
- Merge TSV Files
- MultiQC
- Overall Summary
- Picard
- Picard Summary
- Prepare DESeq2
- Prepare GSEA
- Prepare LimmaVoom
- Quality Filtering
- Quality Filtering Summary
- RSEM
- RSEM Alignment Summary
- RSEM Count
- RSeQC
- salmon bam quant
- Salmon Alignment Summary
- salmon quant
- Salmon Summary
- Salmon transcript to gene count
- Sequential Mapping
- Sequential Mapping Bam count
- Sequential Mapping Summary
- STAR Summary
- Trimmer
- Trimmer Summary
- UCSC BAM2BigWig converter
- UMIextract
- Umitools Summary
References & Additional Documentation
- Related Papers:
- DOI:10.1101/689539 - RNA-seq Pipeline Publication
- DESeq2 Paper
- Limma Voom Paper
- Software Documentation:
- DESeq2 Documentation
- Limma Documentation
- Example Datasets:
- Small mouse dataset: https://www.viafoundry.com/test_data/fastq_mouse/