Skip to content

Germline Variant Calling Pipeline (GATK) Pipeline Specification

Pipeline Details

  • Name: Germline Variant Calling Pipeline (GATK)
  • Pipeline UUID: f931gy4f2onyfr3bf6bkhuydydt1ej
  • Version: 2.1.2
  • View Pipeline:

Overview

Germline Variant Calling Pipeline (GATK) pipeline is designed for calling variants in samples that are clonal – i.e. a single individual. It uses HaplotypeCaller to call germline SNPs and indels via local re-assembly of haplotypes and implements Base Quality Score Recalibration (BQSR) to minimize the effect of technical variation on base quality scores for accurate variant detection.

Key Use cases:

  • Germline Variant Detection: Identification of SNPs and indels in clonal samples with expected variant frequencies of 1 (for haploids or homozygous diploids) or 0.5 (for heterozygous diploids).
  • Base Quality Score Recalibration: Systematic correction of base quality scores to improve variant calling accuracy.
  • Variant Annotation and Effect Prediction: Functional annotation of identified variants using SnpEff to predict biological effects.

Features

  • GATK4 Best Practices Implementation: Follows established GATK4 germline variant calling workflow with HaplotypeCaller.
  • BWA MEM Alignment: High-quality read alignment with proper read group assignment required for GATK functionality.
  • Duplicate Marking: Automated identification and marking of PCR and optical duplicates using GATK MarkDuplicates.
  • Base Quality Score Recalibration (BQSR): Two-pass BQSR implementation with recalibration report generation.
  • Comprehensive Variant Filtering: Hard filtering of SNPs and indels using GATK recommended parameters (QD, FS, MQ, SOR, MQRankSum, ReadPosRankSum).
  • Variant Annotation: Integration with SnpEff for functional annotation and effect prediction.
  • Quality Control Metrics: Collection of alignment metrics, insert size metrics, and coverage depth analysis.
  • Variant Comparison: Optional multi-sample VCF comparison and intersection analysis.

Input/Output Specification

Inputs

Required

Sequencing Reads

  • Description: FASTQ files containing raw sequencing reads from clonal samples
  • Format: .fastq or .fastq.gz
  • Example File Path: /path/to/input/sample.fastq.gz

Reference Genome

  • Description: Reference genome in FASTA format for alignment and variant calling
  • Format: .fa or .fasta
  • Example File Path: /path/to/reference/genome.fa

Optional Inputs

BWA Index

  • Description: Pre-built BWA index for the reference genome (will be created if not provided)
  • Format: BWA index directory
  • Example File Path: /path/to/bwa/index/

Known Variants Database

  • Description: Database identifier for SnpEff annotation (e.g., GRCh38.p7.RefSeq for human, GRCm38.75 for mouse)
  • Format: SnpEff database identifier
  • Example: GRCh38.p7.RefSeq

Outputs

Reported Outputs

  • Annotated VCF File:
  • Description: Final filtered and annotated VCF file containing SNPs with functional annotations
  • Format: .vcf
  • Example File Path: /output/directory/sample_filtered_snps.ann.vcf
  • Visualization App: IGV, UCSC Genome Browser
  • Location: Results folder

  • SnpEff Summary Report:

  • Description: HTML summary report of variant annotations and effects
  • Format: .html
  • Example File Path: /output/directory/sample_snpEff_summary.html
  • Visualization App: Web browser
  • Location: Results folder

  • Recalibrated BAM File:

  • Description: Base quality score recalibrated alignment file
  • Format: .bam
  • Example File Path: /output/directory/sample_recal.bam
  • Visualization App: IGV, SAMtools
  • Location: Results folder

Supporting Outputs

  • Alignment Metrics:
  • Description: Comprehensive alignment statistics and quality metrics
  • Format: .txt
  • Example File Path: /intermediate/directory/sample_alignment_metrics.txt

  • Insert Size Metrics:

  • Description: Insert size distribution statistics and histogram
  • Format: .txt, .pdf
  • Example File Path: /intermediate/directory/sample_insert_metrics.txt

  • BQSR Recalibration Report:

  • Description: Before and after base quality score recalibration plots
  • Format: .pdf
  • Example File Path: /intermediate/directory/sample_recalibration_plots.pdf

  • Filtered VCF Files:

  • Description: Intermediate filtered SNP and indel VCF files
  • Format: .vcf
  • Example File Path: /intermediate/directory/sample_filtered_snps_round1.vcf

Associated Processes

References & Additional Documentation