Meta-CAMP Pipeline Specification
Pipeline Details
- Name:
Meta-CAMP - Pipeline UUID:
6d34fe61976640178ed79dd450bd0a64 - Version:
1.1.1 - View Pipeline:
Overview
Meta-CAMP pipeline is designed for dynamic and educational analyses of metagenomes, bacterial isolates, and microbial communities. The MetaSUB Core Modular Analysis Pipeline (CAMP) is a software toolkit that serves as the primary analytic workflow for the MetaSUB Consortium. The core philosophy is anchored in modularity, enabling users to gain total control over and deep understanding of their bioinformatic analyses through consistently documented and parameterized processes.
Key Use cases:
- Metagenomic Community Analysis: Comprehensive taxonomic profiling and functional analysis of microbial communities from environmental samples.
- MAG (Metagenome-Assembled Genome) Reconstruction: Binning and quality assessment of metagenome-assembled genomes using multiple algorithms.
- Gene Cataloguing and Functional Annotation: Identification, clustering, and functional annotation of open reading frames across samples.
Features
- Modular Design: Each analytical step is defined as a single, consistently documented and parameterized process, allowing for complete workflow customization.
- Multiple Taxonomic Classification Tools: Integrates MetaPhlAn4, Kraken2/Bracken, and XTree for comprehensive taxonomic profiling with standardized output formats.
- Comprehensive Quality Control: Implements FastQC, MultiQC, fastp for read quality assessment, adapter removal using AdapterRemoval, and host read removal using Bowtie2.
- Multi-Algorithm MAG Binning: Utilizes six binning algorithms (MetaBAT2, CONCOCT, SemiBin2, MaxBin2, VAMB, MetaBinner) with DAS Tool ensemble refinement.
- Dual Assembly Options: Supports both MetaSPAdes and MegaHIT assemblers with optional metaviral and plasmid assembly modes.
- Advanced Gene Cataloguing: Uses Bakta for ORF identification, MMSeqs for clustering, and Diamond for functional profiling.
- Interactive Visualization: Generates outputs compatible with microViz and animalcules apps for comparative analysis.
- Error Correction: Implements BayesHammer for sequencing error correction.
Input/Output Specification
Inputs
Required
Reads
- Description: Forward and reverse reads made into a collection from one type of source (e.g., mouse, human)
- Format: .fastq.gz
- Example File Path: /path/to/reads/sample_1.fastq.gz, /path/to/reads/sample_2.fastq.gz
Host Genome
- Description: Reference genome selection that determines which databases are used in the pipeline for host read removal
- Format: Bowtie2 index files
- Options: Human (GRCh38), Mouse (mm10), or other available reference genomes
Adapter File
- Description: Sequencing adapters for trimming adapter sequences from insert DNA
- Format: .txt
- Example File Path: /path/to/adapters/adapters.txt
Metadata
- Description: Sample metadata for microViz and animalcules applications. First column should contain sample names, additional columns can include sample features (age, sex, disease, etc.)
- Format: Tab-separated values (.tsv)
- Example File Path: /path/to/metadata/sample_metadata.tsv
Optional Inputs
Binner Tool Selections
- Description: Selection of at least 3 from six available binning tools (MetaBAT2, CONCOCT, SemiBin2, MaxBin2, VAMB, MetaBinner) for accurate bin creation
- Default: All six tools selected
MicroViz Analysis App Selection
- Description: Option to run microViz comparative analysis (requires 3 or more different samples)
- Format: Boolean selection
Outputs
Reported Outputs
- Short Read Quality Control Reports:
- Description: Pre and post-processing quality control reports
- Format: .html (MultiQC reports)
- Location: summary/fastqc_pre/, summary/fastqc_post/
-
Visualization App: MultiQC
-
Taxonomic Profiling Results:
- Description: Standardized taxonomic abundance tables at multiple taxonomic levels (species, genus, family, order, class, phylum)
- Format: .csv (XTree, Kraken2/Bracken, MetaPhlAn outputs)
- Location: final_reports/
-
Visualization App: Pavian, Krona, animalcules
-
Assembly Files:
- Description: Assembled contigs from MetaSPAdes and/or MegaHIT
- Format: .fasta.gz
- Location: assembly/
-
Visualization App: MetaQUAST reports (.html)
-
MAG Binning Results:
- Description: Refined metagenome-assembled genomes from DAS Tool consensus binning
- Format: .fa
- Location: bins/
-
Visualization App: CheckM2, GTDB-Tk classification reports
-
Gene Cataloguing Outputs:
- Description: ORF cluster tables including relative abundance, counts, sizes, and functional annotations
- Format: .csv, .tsv
- Location: final_reports/
- Files: orf_cluster_sizes.csv, orf_rel_abund.tsv, orf_read_cts.tsv, orf_annotations.tsv
Supporting Outputs
- MAG Quality Control Summary:
- Description: Aggregated quality metrics from GUNC, GTDB-Tk, CheckM2, and QUAST
- Format: .csv
-
Location: final_reports/mag_qc_summary.csv
-
Error Correction Statistics:
- Description: Statistical properties of reads after error correction
- Format: .csv
-
Location: summary/
-
Assembly Statistics:
- Description: Contig length and assembly descriptive statistics
- Format: .csv
- Location: assembly/stats/
Associated Processes
- AdapterRemoval
- aggregate cov
- aggregate dnadiff
- aggregate quast
- BBMap BBmerge
- Bowtie2
- bracken
- Build Bowtie2 Index
- call orfs
- Check Build Bowtie2 Index
- check Bowtie2 files
- checkm cov
- checkm sh
- checkm2
- cluster orfs
- compute relative abundances
- concat fastqs
- concat statistics
- CONCOCT
- ContigDepthCalc
- ctg name edit
- DAS Output Collector
- DAS Tool
- DAS Tool Prep
- dedup metaphlan
- extract unclassified kraken
- extract unclassified metaphlan
- extract unclassified names
- fastq collect
- FastQC
- FastQC post
- filter gene catalog
- filter host reads
- filter low qual
- filter seq errors bh
- FuncMerger
- gtdbtk get mag refs
- gunc
- HUMAnN ReNormalize
- HUMAnN3
- index gene catalog
- init statistics
- kraken2
- make config
- make xtree input
- mask reads
- MaxBin2
- MegaHIT
- merge bracken
- merge metaphlan
- merge orf seqs
- merge sample orfs
- merge xtree outputs
- MetaBAT2
- MetaBinner
- metaphlan
- multiqc
- parse dnadiff
- prokka ctg
- Quast
- quast
- run alignments
- Samtools
- scrub fastq captions
- SemiBin2
- shiny file process
- SPAdes
- standardize bracken
- standardize metaphlan
- standardize xtree
- summarize gene cts
- summarize reports
- VAMB
- xtree
References & Additional Documentation
- MetaSUB Consortium: International MetaSUB Consortium
- Pipeline Repository: Meta-CAMP GitHub Repository
- Related Publications: Mason, C.E., et al. The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report. Microbiome 4, 24 (2016).