Skip to content

Bam Quantification and DE Pipeline Specification

Pipeline Details

  • Name: Bam Quantification and DE
  • Pipeline UUID: cgke8jfstrfdtuww86zf090fz8as0w
  • Version: 1.0.0
  • View Pipeline:

Overview

Bam Quantification and DE pipeline is designed for quantifying gene/feature counts from BAM input files and performing optional differential expression analysis. It automates the process of read counting from aligned BAM files and provides comprehensive differential expression analysis using either DESeq2 or Limma Voom statistical methods to ensure reliable and reproducible results.

Key Use cases:

  • Gene Expression Quantification: Extract gene and transcript-level counts from aligned BAM files using featureCounts or Salmon quantification methods.
  • Differential Expression Analysis: Perform statistical analysis to identify significantly differentially expressed genes between experimental conditions using DESeq2 or Limma Voom.
  • Multi-sample Comparative Studies: Process multiple BAM files simultaneously for comparative analysis across different biological conditions or treatments.

Features

  • Multiple Quantification Methods: Supports both featureCounts and Salmon for read quantification from BAM files with flexible parameter configurations.
  • Dual Statistical Engines: Offers both DESeq2 and Limma Voom for differential expression analysis, allowing users to choose the most appropriate method for their data.
  • Comprehensive Quality Control: Implements extensive QC steps including PCA plots, hierarchical clustering, count distribution analysis, and reproducibility plots.
  • Batch Effect Correction: Built-in batch correction capabilities for PCA and heatmap visualizations to handle experimental batch effects.
  • Custom Genome Integration: Supports addition of custom sequences to reference genomes with automatic GTF file generation and indexing.
  • GSEA Preparation: Automatically prepares ranked gene lists and necessary files for Gene Set Enrichment Analysis.
  • Flexible Filtering Options: Configurable filtering strategies (local vs global) with customizable minimum count and sample thresholds.
  • Automated Library Type Detection: Automatically detects single-end vs paired-end sequencing data from BAM files.

Input/Output Specification

Inputs

Required

The pipeline requires BAM files as primary input, with quantification and differential expression analysis configured through pipeline parameters rather than direct file inputs.

BAM Files

  • Description: Aligned BAM files containing mapped RNA-seq reads for quantification
  • Format: .bam
  • Requirements: Must be properly aligned and indexed BAM files from RNA-seq experiments

Optional Inputs

Counts File

  • Description: Pre-computed gene or transcript count matrix (if bypassing quantification steps)
  • Format: Tab-separated values (.tsv)
  • Requirements: First column must contain features (genes/transcripts), header must contain sample names, no hyphens/dashes/spaces in column names

Groups File

  • Description: Sample metadata file containing experimental design information
  • Format: Tab-separated values (.tsv)
  • Required Columns: sample_name, group
  • Additional Columns: Can include additional metadata for batch correction or other experimental factors

Comparison File

  • Description: Specifies which groups to compare in differential expression analysis
  • Format: Tab-separated values (.tsv)
  • Required Columns: controls, treats, names
  • Optional Column: grouping_column (overrides default group column)

Outputs

Reported Outputs

  • Gene Count Matrix:
  • Description: Normalized gene expression counts from featureCounts or Salmon quantification
  • Format: .tsv
  • Location: Main output directory
  • Visualization App: DE Browser, Excel

  • Differential Expression Results:

  • Description: Statistical results from DESeq2 or Limma Voom analysis including fold changes, p-values, and adjusted p-values
  • Format: .tsv
  • Location: DE_reports folder
  • Visualization App: DE Browser, Volcano Plot Viewer

  • Quality Control Plots:

  • Description: PCA plots, heatmaps, MA plots, and volcano plots for data visualization
  • Format: .png, .pdf
  • Location: DE_reports folder
  • Visualization App: Image viewers, integrated report viewer

Supporting Outputs

  • Quantification Summary Files:
  • Description: Summary statistics from featureCounts including alignment statistics and feature assignments
  • Format: .txt, .summary
  • Location: Intermediate output directory

  • GSEA Prepared Files:

  • Description: Ranked gene lists and configuration files prepared for Gene Set Enrichment Analysis
  • Format: .rnk, .txt
  • Location: GSEA_reports folder

Associated Processes

References & Additional Documentation