smallRNA-Seq Pipeline Specification
Pipeline Details
- Name:
smallRNA-Seq - Pipeline UUID:
3e068adf3394450ab6993cd3251459be - Version:
1.0.0 - View Pipeline:
Overview
smallRNA-Seq pipeline is designed for analyzing small RNA sequencing data. It automates quality control, rRNA filtering, and sequential mapping to common small RNAs including miRNAs, tRNAs, snRNAs, and piRNAs to ensure reliable and reproducible results.
Key Use cases:
- Small RNA Discovery and Quantification: Identification and quantification of miRNAs, tRNAs, snRNAs, and piRNAs from sequencing data.
- Sequential Mapping Analysis: Systematic mapping to different small RNA classes to determine the composition of small RNA populations.
- Quality Control and Preprocessing: Comprehensive QC workflow including adapter removal, trimming, and quality filtering optimized for small RNA libraries.
Features
- Support for Multiple Aligner Options: Includes STAR, Bowtie, and Bowtie2 aligners for flexible mapping strategies.
- Sequential Mapping Workflow: Systematic mapping to common small RNA databases including miRNAs, tRNAs, snRNAs, and piRNAs.
- Comprehensive Quality Control: Implements FastQC, adapter removal with Trimmomatic/Fastx_clipper, quality filtering, and trimming steps.
- UMI Support: Optional UMI extraction and deduplication capabilities for enhanced quantification accuracy.
- Modular Design: Supports customization with configurable parameters for different small RNA analysis workflows.
- Automated Reporting: Generates detailed summary reports and visualizations for each processing step.
- rRNA Filtering: Specialized filtering to remove ribosomal RNA contamination common in small RNA libraries.
Input/Output Specification
Inputs
Required
Sequencing Reads
- Description: FASTQ files containing small RNA sequencing reads (single-end or paired-end)
- Format: .fastq.gz
- Example File Path: /path/to/input/sample.fastq.gz
Reference Genome
- Description: Reference genome FASTA file for alignment
- Format: .fasta/.fa
- Example File Path: /path/to/reference/genome.fa
GTF Annotation File
- Description: Gene annotation file in GTF format
- Format: .gtf
- Example File Path: /path/to/annotation/genes.gtf
Optional Inputs
Sequential Mapping Database
- Description: Pre-built database containing small RNA sequences for sequential mapping
- Format: Directory containing indexed sequences
- Example File Path: /path/to/commondb/
UMI Pattern
- Description: Pattern for UMI extraction if UMI-based deduplication is required
- Format: Text pattern (e.g., "NNNNNNNN" for 8bp UMI)
- Example: NNNNNNNN
Outputs
Reported Outputs
- Sequential Mapping Summary:
- Description: Comprehensive mapping statistics showing read distribution across different small RNA classes
- Format: .tsv
- Example File Path: /output/sequential_mapping_summary.tsv
- Visualization App: Excel, R, or custom plotting tools
-
Location: Report Folder
-
Quality Control Reports:
- Description: FastQC reports before and after preprocessing steps
- Format: .html, .zip
- Example File Path: /output/fastqc_reports/
- Visualization App: Web browser
-
Location: Report Folder
-
Small RNA Count Matrix:
- Description: Read counts for each small RNA feature across samples
- Format: .tsv
- Example File Path: /output/smallRNA_counts.tsv
- Visualization App: DE Browser, R, Python
- Location: Report Folder
Supporting Outputs
- Aligned BAM Files:
- Description: Sorted and indexed BAM files from sequential mapping steps
- Format: .bam, .bai
-
Example File Path: /intermediate/aligned_reads_sorted.bam
-
Processing Log Files:
- Description: Detailed logs from adapter removal, trimming, and quality filtering steps
- Format: .log
- Example File Path: /intermediate/processing_logs/
Associated Processes
- Add custom seq to genome gtf
- Adapter Removal
- Adapter Removal Summary
- Check BED12
- check files
- Check Genome GTF
- Check chrom sizes and index
- Check Sequential Mapping Indexes
- convert gtf attributes
- Deduplication Summary
- Download build sequential mapping indexes
- FastQC
- FastQC after Adapter Removal
- Merge TSV Files
- Quality Filtering
- Quality Filtering Summary
- Sequential Mapping
- Sequential Mapping Bam count
- Sequential Mapping Summary
- Trimmer
- Trimmer Summary
- UMIextract
- Umitools Summary
References & Additional Documentation
- Pipeline Repository: ViaFoundry smallRNA-Seq Pipeline
- Related Documentation: Small RNA Analysis Best Practices
- Container Images: Available on Quay.io registry (quay.io/viascientific/)