Skip to content

smallRNA-Seq Pipeline Specification

Pipeline Details

  • Name: smallRNA-Seq
  • Pipeline UUID: 3e068adf3394450ab6993cd3251459be
  • Version: 1.0.0
  • View Pipeline:

Overview

smallRNA-Seq pipeline is designed for analyzing small RNA sequencing data. It automates quality control, rRNA filtering, and sequential mapping to common small RNAs including miRNAs, tRNAs, snRNAs, and piRNAs to ensure reliable and reproducible results.

Key Use cases:

  • Small RNA Discovery and Quantification: Identification and quantification of miRNAs, tRNAs, snRNAs, and piRNAs from sequencing data.
  • Sequential Mapping Analysis: Systematic mapping to different small RNA classes to determine the composition of small RNA populations.
  • Quality Control and Preprocessing: Comprehensive QC workflow including adapter removal, trimming, and quality filtering optimized for small RNA libraries.

Features

  • Support for Multiple Aligner Options: Includes STAR, Bowtie, and Bowtie2 aligners for flexible mapping strategies.
  • Sequential Mapping Workflow: Systematic mapping to common small RNA databases including miRNAs, tRNAs, snRNAs, and piRNAs.
  • Comprehensive Quality Control: Implements FastQC, adapter removal with Trimmomatic/Fastx_clipper, quality filtering, and trimming steps.
  • UMI Support: Optional UMI extraction and deduplication capabilities for enhanced quantification accuracy.
  • Modular Design: Supports customization with configurable parameters for different small RNA analysis workflows.
  • Automated Reporting: Generates detailed summary reports and visualizations for each processing step.
  • rRNA Filtering: Specialized filtering to remove ribosomal RNA contamination common in small RNA libraries.

Input/Output Specification

Inputs

Required

Sequencing Reads

  • Description: FASTQ files containing small RNA sequencing reads (single-end or paired-end)
  • Format: .fastq.gz
  • Example File Path: /path/to/input/sample.fastq.gz

Reference Genome

  • Description: Reference genome FASTA file for alignment
  • Format: .fasta/.fa
  • Example File Path: /path/to/reference/genome.fa

GTF Annotation File

  • Description: Gene annotation file in GTF format
  • Format: .gtf
  • Example File Path: /path/to/annotation/genes.gtf

Optional Inputs

Sequential Mapping Database

  • Description: Pre-built database containing small RNA sequences for sequential mapping
  • Format: Directory containing indexed sequences
  • Example File Path: /path/to/commondb/

UMI Pattern

  • Description: Pattern for UMI extraction if UMI-based deduplication is required
  • Format: Text pattern (e.g., "NNNNNNNN" for 8bp UMI)
  • Example: NNNNNNNN

Outputs

Reported Outputs

  • Sequential Mapping Summary:
  • Description: Comprehensive mapping statistics showing read distribution across different small RNA classes
  • Format: .tsv
  • Example File Path: /output/sequential_mapping_summary.tsv
  • Visualization App: Excel, R, or custom plotting tools
  • Location: Report Folder

  • Quality Control Reports:

  • Description: FastQC reports before and after preprocessing steps
  • Format: .html, .zip
  • Example File Path: /output/fastqc_reports/
  • Visualization App: Web browser
  • Location: Report Folder

  • Small RNA Count Matrix:

  • Description: Read counts for each small RNA feature across samples
  • Format: .tsv
  • Example File Path: /output/smallRNA_counts.tsv
  • Visualization App: DE Browser, R, Python
  • Location: Report Folder

Supporting Outputs

  • Aligned BAM Files:
  • Description: Sorted and indexed BAM files from sequential mapping steps
  • Format: .bam, .bai
  • Example File Path: /intermediate/aligned_reads_sorted.bam

  • Processing Log Files:

  • Description: Detailed logs from adapter removal, trimming, and quality filtering steps
  • Format: .log
  • Example File Path: /intermediate/processing_logs/

Associated Processes

References & Additional Documentation