UMInator Pipeline Specification
Pipeline Details
- Name:
UMInator - Pipeline UUID:
2e063e9e160643698093ff2249d089bc - Version:
1.0.0 - View Pipeline:
Overview
UMInator pipeline is designed for generating consensus sequences from Nanopore reads tagged with Unique Molecular Identifiers (UMIs). It builds a database of high-quality UMIs based on UMI structure, length and presence of flanking adapters and PCR primers found in the reads. The pipeline bins reads into files based on their match to high-quality UMIs, collapses reads assigned to each UMI to produce consensus sequences, and finally trims PCR primers to produce high-quality consensus sequences.
Key Use cases:
- UMI-based Error Correction: Generating high-accuracy consensus sequences from error-prone Nanopore reads using UMI tagging.
- Amplicon Sequencing Analysis: Processing PCR-amplified samples with UMI tags for improved sequence accuracy.
- Single Molecule Consensus: Creating consensus sequences from multiple reads originating from the same original molecule.
Features
- UMI Database Construction: Builds high-quality UMI databases based on structure, length, and flanking sequence validation.
- Flexible UMI Design Support: Accommodates double UMI designs with configurable patterns and clustering parameters.
- Multi-stage Consensus Calling: Implements draft consensus generation followed by optional polishing with Racon and Medaka.
- Quality Control Integration: Includes comprehensive QC reporting with NanoPlot visualization and UMI assignment statistics.
- Configurable Filtering: Supports customizable read filtering based on quality scores and length parameters using NanoFilt.
- Primer Trimming: Automated removal of PCR primers and external sequences using Cutadapt.
- Scalable Processing: Parallelized processing with configurable thread allocation for each step.
- Tool Integration: Leverages established bioinformatics tools including BWA, Vsearch, Seqtk, and Samtools.
Input/Output Specification
Inputs
Required
Nanopore Reads
- Description: FASTQ files containing raw Nanopore sequencing reads tagged with UMIs
- Format: .fastq or .fastq.gz
- Example File Path: /path/to/input/nanopore_reads.fastq.gz
UMI Configuration Parameters
- Description: Pipeline parameters defining UMI structure, primers, and adapters
- Required Parameters:
- UMI length (UMILen)
- Forward/Reverse primers (FW_primer, RV_primer)
- Forward/Reverse adapters (FW_adapter, RV_adapter)
- UMI pattern (UMIPattern)
- Format: Pipeline configuration parameters
Outputs
Reported Outputs
- Consensus Sequences:
- Description: High-quality consensus sequences generated from UMI-tagged reads
- Format: .fasta
- Example File Path: /output/sample/sample_consensus_polished_primersTrimmed.fasta
- Visualization App: Any sequence viewer or alignment tool
-
Location: primersTrimming folder
-
QC Reports:
- Description: Quality control plots and statistics for binned and unbinned reads
- Format: .html (NanoPlot reports)
- Example File Path: /output/QC/sample/QC_binned_reads/NanoPlot-report.html
- Visualization App: Web browser
-
Location: QC folder
-
UMI Assignment Statistics:
- Description: Tab-separated file with UMI assignment statistics and read counts
- Format: .tsv
- Example File Path: /output/QC/sample/sample_UMI_stats.tsv
- Visualization App: Spreadsheet software or text editor
- Location: QC folder
Supporting Outputs
- Filtered Reads:
- Description: Quality-filtered reads based on length and quality parameters
- Format: .fastq
-
Example File Path: /intermediate/readsFiltering/sample/sample_filtered.fastq
-
UMI Database:
- Description: High-quality UMI database used for read assignment
- Format: .fasta
-
Example File Path: /intermediate/candidateUMIsFiltering/sample/UMI_db.fasta
-
Binned Reads:
- Description: Reads assigned to specific UMIs
- Format: .fastq
- Example File Path: /intermediate/readsUMIsAssignment/sample/umi_X.fastq
Associated Processes
- candidateUMIsExtraction
- candidateUMIsFiltering
- consensusPolishing
- draftConsensusCalling
- primersTrimming
- QC
- readsFiltering
- readsUMIsAssignment
References & Additional Documentation
- Related Papers/links: UMI-based error correction methods for long-read sequencing
- Pipeline Repository: Contact pipeline maintainers for repository access
- Workflow Diagram: Available in pipeline description pages