Skip to content

Extract Unclassified Metaphlan Specifications

Process Details

  • Name: extract_unclassified_metaphlan
  • Process UUID: f931k69tytqjd4rrmpxgam2thh1x1u
  • Process Group: short_read_taxonomy

Overview

This process extracts unclassified reads from MetaPhlAn taxonomic classification results stored in SAM format files. It identifies reads that could not be classified to any known taxonomic group and converts them back to FASTQ format for downstream analysis or alternative classification approaches.

This process is implemented in Bash, utilizing SAMtools for efficient extraction and conversion of unmapped reads.

Key Functionality

  • Unclassified Read Extraction: Identifies and extracts reads flagged as unmapped (flag 4) in the SAM file, representing sequences that could not be taxonomically classified by MetaPhlAn
  • Paired-End Read Separation: Maintains proper pairing structure by separating forward and reverse reads into distinct output files
  • Singleton Read Handling: Captures unpaired reads that lost their mate during the classification process
  • FASTQ Conversion: Converts extracted SAM entries back to compressed FASTQ format for compatibility with downstream tools

Input/Output Specification

Inputs

Required Inputs

  • SAM Files
    • Description: SAM format files containing MetaPhlAn taxonomic classification results with aligned and unaligned reads
    • Format: SAM

Outputs

  • FASTQ Set
    • Description: Compressed FASTQ files containing unclassified reads separated into forward reads (_1.fastq.gz), reverse reads (_2.fastq.gz), and unpaired reads (_unp.fastq.gz)
    • Format: FASTQ (gzip compressed)

Parameters & Settings

This process uses default SAMtools parameters and does not expose user-configurable options in the Foundry UI.

References & Resources

  • Tool Documentation: SAMtools official documentation (http://www.htslib.org/doc/samtools.html)
  • Related Papers: Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., ... & 1000 Genome Project Data Processing Subgroup. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), 2078-2089.