Candidate UMIs Extraction Specifications
Process Details
- Name:
candidateUMIsExtraction - Process UUID:
ece48f92958342caaa854ede35b6795b - Process Group:
uminator
Overview
This process extracts candidate Unique Molecular Identifiers (UMIs) from filtered sequencing reads by searching for UMI sequences located between specific adapter and primer sequences. The process implements a double UMI design approach, analyzing both the 5' and 3' ends of reads to identify and extract UMI sequences that meet specified length criteria.
This process is implemented in Bash, which utilizes bioinformatics tools including seqtk for sequence manipulation and cutadapt for adapter trimming and UMI extraction.
Key Functionality
- Reverse Complement Generation: Computes reverse complements of primer and adapter sequences for comprehensive UMI search
- Double UMI Extraction: Extracts UMI sequences from both 5' and 3' ends of reads using exact and approximate length matching
- Sequence Processing: Trims reads to specified lengths and performs quality-controlled UMI candidate identification
- Format Conversion: Converts extracted UMI candidates from FASTQ to FASTA format for downstream analysis
Input/Output Specification
Inputs
Required Inputs
- outputDir
- Description: Directory containing filtered sequencing reads from previous processing steps
- Format: Directory
Outputs
- outputDir
- Description: Directory containing extracted UMI candidates in both FASTQ and FASTA formats, organized by sample
- Format: Directory
References & Resources
- Tool Documentation: Contact the team for details on the UMI extraction pipeline implementation