Skip to content

Somatic Variant Calling Pipeline from BAM for RNA-seq Pipeline Specification

Pipeline Details

  • Name: Somatic Variant Calling Pipeline from BAM for RNA-seq
  • Pipeline UUID: f931g3zfpy4n03236h09w0j9temytr
  • Version: 1.1.1
  • View Pipeline:

Overview

Somatic Variant Calling Pipeline from BAM for RNA-seq pipeline is designed for identifying somatic short variants (SNVs and Indels) in one or more tumor samples from a single individual, with or without a matched normal sample. It processes RNA-sequencing BAM files through comprehensive quality control, read group management, base quality score recalibration, and variant calling to ensure reliable and reproducible somatic variant detection results.

Key Use cases:

  • Cancer Research: Identification of somatic mutations in tumor RNA-seq samples compared to matched normal controls.
  • Comparative Genomics: Detection of variants between different sample conditions or treatment groups in RNA-seq data.
  • Clinical Diagnostics: Discovery of actionable somatic variants in cancer patient samples for precision medicine applications.

Features

  • RNA-seq Specific Processing: Specialized handling of RNA-seq data with SplitNCigarReads for proper junction processing.
  • Comprehensive Quality Control: Implements duplicate marking, base quality score recalibration (BQSR), and read group management.
  • Somatic Variant Detection: Utilizes GATK Mutect2 for accurate somatic SNV and Indel calling with tumor-normal comparison capability.
  • Flexible Sample Comparison: Supports both tumor-only and tumor-normal paired analysis workflows.
  • Variant Annotation: Integrates Ensembl VEP for comprehensive variant effect prediction and annotation.
  • GATK Best Practices: Follows GATK recommended workflows for RNA-seq variant calling with proper reference preparation.
  • Containerized Execution: All processes run in standardized Docker containers ensuring reproducibility across environments.

Input/Output Specification

Inputs

Required

The pipeline processes BAM files from RNA-seq data along with reference materials and sample metadata for somatic variant calling.

BAM Files

  • Description: Aligned RNA-seq BAM files containing mapped reads from tumor and/or normal samples
  • Format: .bam
  • Example File Path: /path/to/input/sample_aligned.bam

Reference Genome

  • Description: Reference genome sequence in FASTA format for variant calling
  • Format: .fa/.fasta
  • Example File Path: /path/to/reference/genome.fa

Known Variants

  • Description: VCF files containing known SNPs and Indels for base quality score recalibration
  • Format: .vcf/.vcf.gz
  • Example File Path: /path/to/known_sites/dbsnp.vcf.gz

Optional Inputs

Sample Groups TSV

  • Description: Tab-separated file defining sample groupings for comparison analysis
  • Required Columns: Sample ID, Group assignment
  • Format: Tab-separated values (.tsv)
  • Example File Path: /path/to/metadata/groups.tsv

Comparisons TSV

  • Description: Tab-separated file defining which sample groups to compare for variant calling
  • Required Columns: Control group, Treatment group, Comparison name
  • Format: Tab-separated values (.tsv)
  • Example File Path: /path/to/metadata/comparisons.tsv

GTF Annotation

  • Description: Gene annotation file for variant effect prediction
  • Format: .gtf
  • Example File Path: /path/to/annotation/genes.gtf

Outputs

Reported Outputs

  • Somatic Variants VCF:
  • Description: Called somatic variants in VCF format with quality scores and filters
  • Format: .vcf.gz
  • Example File Path: /output/somatic_variants/comparison_name.vcf.gz
  • Visualization App: IGV, UCSC Genome Browser
  • Location: Variant Calls Folder

  • Annotated Variants VCF:

  • Description: Somatic variants with functional annotations from Ensembl VEP
  • Format: .vcf
  • Example File Path: /output/annotated/sample_annotated.vcf
  • Visualization App: VEP Web Interface, IGV
  • Location: Annotations Folder

Supporting Outputs

  • Recalibrated BAM Files:
  • Description: Quality score recalibrated BAM files ready for variant calling
  • Format: .bam + .bai
  • Example File Path: /intermediate/recalibrated/sample_recal.bam

  • Duplicate Metrics:

  • Description: Statistics on duplicate read identification and removal
  • Format: .txt
  • Example File Path: /intermediate/metrics/sample_dedup_metrics.txt

  • Base Recalibration Tables:

  • Description: Base quality score recalibration data tables
  • Format: .txt
  • Example File Path: /intermediate/bqsr/sample_recal_data.txt

Associated Processes

References & Additional Documentation

  • Related Papers/links:
  • GATK Best Practices for RNA-seq variant calling: https://gatk.broadinstitute.org/hc/en-us/articles/360035531192
  • Mutect2 Publication: https://www.nature.com/articles/nbt.2514
  • Ensembl VEP Documentation: https://useast.ensembl.org/info/docs/tools/vep/index.html
  • Pipeline Repository: Contact ViaFoundry for access to pipeline source code
  • Workflow Diagram: Available in the pipeline description page on ViaFoundry platform