Germline Variant Calling Pipeline from BAM Pipeline Specification
Pipeline Details
- Name:
Germline Variant Calling Pipeline from BAM - Pipeline UUID:
xabm2p57zbizx8cf18ojj5xwiyy8yq - Version:
2.0.1 - View Pipeline:
Overview
Germline Variant Calling Pipeline from BAM pipeline is designed for calling variants in clonal samples using GATK4. This pipeline is intended for single individual samples where variant frequencies are expected to be 1 (for haploids or homozygous diploids) or 0.5 (for heterozygous diploids). The pipeline implements Base Quality Score Recalibration (BQSR) to minimize technical variation effects on base quality scores for accurate variant detection.
Key Use cases:
- Germline Variant Detection: Identification of SNPs and indels from aligned BAM files of individual samples.
- Base Quality Score Recalibration: Systematic correction of base quality scores to improve variant calling accuracy.
- Quality Control and Metrics: Comprehensive alignment metrics collection and coverage analysis for data validation.
Features
- GATK4 Best Practices Implementation: Follows established GATK4 germline variant calling workflow with proven filtering parameters.
- Comprehensive Quality Control: Includes alignment metrics, insert size metrics, depth analysis, and BQSR recalibration reports.
- Dual-Pass BQSR: Implements two-round Base Quality Score Recalibration with before/after comparison reports.
- Separate SNP and Indel Processing: Processes SNPs and indels independently with tailored filtering criteria optimized for each variant type.
- Containerized Execution: Uses standardized Docker containers (GATK4, Picard, SAMtools) ensuring reproducible results across environments.
- Flexible Filtering: Implements hard filtering with established thresholds for QD, FS, MQ, SOR, MQRankSum, and ReadPosRankSum metrics.
Input/Output Specification
Inputs
Required
Mapped Reads (BAM)
- Description: Aligned BAM files containing mapped sequencing reads from individual samples
- Format: .bam
- Example File Path: /path/to/input/sample.bam
Reference Genome
- Description: Reference genome sequence in FASTA format used for variant calling
- Format: .fa/.fasta
- Example File Path: /path/to/reference/genome.fa
Outputs
Reported Outputs
- Recalibrated BAM Files:
- Description: Base quality score recalibrated alignment files ready for downstream analysis
- Format: .bam
- Example File Path: /output/directory/sample_recal.bam
-
Location: Main output folder
-
Filtered Variant Calls:
- Description: High-quality SNPs and indels passing filtering criteria
- Format: .vcf
- Example File Path: /output/directory/sample_filtered_variants.vcf
- Location: Variants folder
Supporting Outputs
- Alignment Metrics:
- Description: Comprehensive alignment statistics including mapping rates and quality metrics
- Format: .txt
-
Example File Path: /output/metrics/sample_alignment_metrics.txt
-
Insert Size Metrics:
- Description: Insert size distribution analysis with histogram visualization
- Format: .txt, .pdf
-
Example File Path: /output/metrics/sample_insert_metrics.txt
-
Coverage Analysis:
- Description: Per-base depth coverage across the genome
- Format: .txt
-
Example File Path: /output/metrics/sample_depth_out.txt
-
BQSR Recalibration Report:
- Description: Before and after base quality recalibration comparison plots
- Format: .pdf
- Example File Path: /output/reports/sample_recalibration_plots.pdf
Associated Processes
- AnalyzeCovariates
- applyBSQRS
- BaseRecalibrator
- build gatk4 genome dictionary
- getMetrics
- HaplotypeCaller
- markDuplicates
- selectVariants
- VariantFiltration
References & Additional Documentation
- Related Papers/links: GATK4 Germline Variant Calling Best Practices
- Pipeline Repository: Adapted from NYU Gencore Variant Calling Pipeline (https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/)
- Hard Filtering Documentation: GATK Hard Filtering Guidelines