RiboSeq Pipeline Specification
Pipeline Details
Overview
RiboSeq pipeline is designed for processing ribosome profiling (Ribo-seq) data. Ribo-seq captures ribosome-protected mRNA fragments to provide a snapshot of active translation in a cell, enabling precise mapping of ribosome positions on transcripts and offering insights into translation dynamics, ribosome occupancy, and coding potential. The pipeline automates data preprocessing, quality control, alignment, quantification, and ORF prediction to ensure reliable and reproducible results.
Key Use cases:
- Translation Dynamics Analysis: Mapping ribosome positions on transcripts to study active translation.
- ORF Discovery: Identifying and predicting open reading frames and their translation potential.
- Ribosome Occupancy Profiling: Quantifying ribosome density across different genomic features.
Features
- Comprehensive Quality Control: Implements FastQC analysis at multiple stages with adapter removal, trimming, and quality filtering options.
- Multiple Quantification Methods: Supports both featureCounts and Salmon for gene and transcript-level quantification.
- UMI Support: Includes UMI extraction and deduplication capabilities for enhanced accuracy.
- ORF Prediction: Implements ORF detection and translation prediction based on validated methodologies.
- Flexible Read Processing: Handles both single-end and paired-end sequencing data with customizable trimming and filtering parameters.
- Modular Design: Supports customization with optional preprocessing steps and multiple alignment strategies.
- Comprehensive Reporting: Generates detailed summary reports and visualizations for each processing step.
Input/Output Specification
Note: This pipeline uses dynamic input/output configuration. Specific inputs and outputs are defined during pipeline execution based on the selected processing modules.
Inputs
Required
The pipeline accepts standard sequencing inputs including FASTQ files and reference annotations, with specific requirements determined by the selected processing modules and analysis parameters.
Outputs
Reported Outputs
- Gene Expression Matrix: Quantified gene expression levels from ribosome profiling data
- Transcript Expression Matrix: Transcript-level quantification results
- ORF Predictions: Identified open reading frames with translation potential scores
- Quality Control Reports: Comprehensive QC summaries from FastQC and processing steps
- Alignment Statistics: Detailed mapping and alignment summary statistics
Supporting Outputs
- Processed BAM Files: Aligned and processed sequencing reads
- Summary Tables: Detailed statistics from each processing step
- Log Files: Processing logs for debugging and quality assessment
Associated Processes
- Add custom seq to genome gtf
- Adapter Removal
- Adapter Removal Summary
- Check BED12
- Check Build Hisat2 Index
- Check Build Kallisto Index
- check files
- Check Genome GTF
- Check chrom sizes and index
- check Hisat2 files
- check kallisto files
- convert gtf attributes
- FastQC
- FastQC after Adapter Removal
- featureCounts
- featureCounts Prep
- featureCounts summary
- HISAT2 Summary
- Kallisto Alignment Summary
- Kallisto Summary
- Kallisto transcript to gene count
- kallisto quant
- Map HISAT2
- Merge Bam
- Merge TSV Files
- Quality Filtering
- Quality Filtering Summary
- ribotricer detect orfs
- ribotricer index prepare
- RiboSeq Profiling QC
- salmon bam quant
- Salmon Summary
- Salmon transcript to gene count
- Trimmer
- Trimmer Summary
- UMIextract
- Umitools Summary
References & Additional Documentation
- Related Papers:
- "Accurate detection of short and long active ORFs using Ribo-seq data" - https://pubmed.ncbi.nlm.nih.gov/31750902/
- "Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution"
- Workflow Diagram: Available in pipeline description pages