Bam Quantification and DE Pipeline Specification
Pipeline Details
- Name:
Bam Quantification and DE - Pipeline UUID:
cgke8jfstrfdtuww86zf090fz8as0w - Version:
1.0.0 - View Pipeline:
Overview
Bam Quantification and DE pipeline is designed for quantifying gene/feature counts from BAM input files and performing optional differential expression analysis. It automates the process of read counting from aligned BAM files and provides comprehensive differential expression analysis using either DESeq2 or Limma Voom statistical methods to ensure reliable and reproducible results.
Key Use cases:
- Gene Expression Quantification: Extract gene and transcript-level counts from aligned BAM files using featureCounts or Salmon quantification methods.
- Differential Expression Analysis: Perform statistical analysis to identify significantly differentially expressed genes between experimental conditions using DESeq2 or Limma Voom.
- Multi-sample Comparative Studies: Process multiple BAM files simultaneously for comparative analysis across different biological conditions or treatments.
Features
- Multiple Quantification Methods: Supports both featureCounts and Salmon for read quantification from BAM files with flexible parameter configurations.
- Dual Statistical Engines: Offers both DESeq2 and Limma Voom for differential expression analysis, allowing users to choose the most appropriate method for their data.
- Comprehensive Quality Control: Implements extensive QC steps including PCA plots, hierarchical clustering, count distribution analysis, and reproducibility plots.
- Batch Effect Correction: Built-in batch correction capabilities for PCA and heatmap visualizations to handle experimental batch effects.
- Custom Genome Integration: Supports addition of custom sequences to reference genomes with automatic GTF file generation and indexing.
- GSEA Preparation: Automatically prepares ranked gene lists and necessary files for Gene Set Enrichment Analysis.
- Flexible Filtering Options: Configurable filtering strategies (local vs global) with customizable minimum count and sample thresholds.
- Automated Library Type Detection: Automatically detects single-end vs paired-end sequencing data from BAM files.
Input/Output Specification
Inputs
Required
The pipeline requires BAM files as primary input, with quantification and differential expression analysis configured through pipeline parameters rather than direct file inputs.
BAM Files
- Description: Aligned BAM files containing mapped RNA-seq reads for quantification
- Format: .bam
- Requirements: Must be properly aligned and indexed BAM files from RNA-seq experiments
Optional Inputs
Counts File
- Description: Pre-computed gene or transcript count matrix (if bypassing quantification steps)
- Format: Tab-separated values (.tsv)
- Requirements: First column must contain features (genes/transcripts), header must contain sample names, no hyphens/dashes/spaces in column names
Groups File
- Description: Sample metadata file containing experimental design information
- Format: Tab-separated values (.tsv)
- Required Columns: sample_name, group
- Additional Columns: Can include additional metadata for batch correction or other experimental factors
Comparison File
- Description: Specifies which groups to compare in differential expression analysis
- Format: Tab-separated values (.tsv)
- Required Columns: controls, treats, names
- Optional Column: grouping_column (overrides default group column)
Outputs
Reported Outputs
- Gene Count Matrix:
- Description: Normalized gene expression counts from featureCounts or Salmon quantification
- Format: .tsv
- Location: Main output directory
-
Visualization App: DE Browser, Excel
-
Differential Expression Results:
- Description: Statistical results from DESeq2 or Limma Voom analysis including fold changes, p-values, and adjusted p-values
- Format: .tsv
- Location: DE_reports folder
-
Visualization App: DE Browser, Volcano Plot Viewer
-
Quality Control Plots:
- Description: PCA plots, heatmaps, MA plots, and volcano plots for data visualization
- Format: .png, .pdf
- Location: DE_reports folder
- Visualization App: Image viewers, integrated report viewer
Supporting Outputs
- Quantification Summary Files:
- Description: Summary statistics from featureCounts including alignment statistics and feature assignments
- Format: .txt, .summary
-
Location: Intermediate output directory
-
GSEA Prepared Files:
- Description: Ranked gene lists and configuration files prepared for Gene Set Enrichment Analysis
- Format: .rnk, .txt
- Location: GSEA_reports folder
Associated Processes
- Add custom seq to genome gtf
- Check BED12
- check files
- Check Genome GTF
- Check chrom sizes and index
- convert gtf attributes
- DE collector
- detect library type
- featureCounts
- featureCounts Prep
- featureCounts summary
- Prepare DESeq2
- Prepare GSEA
- Prepare LimmaVoom
- salmon bam quant
- Salmon Summary
- Salmon transcript to gene count
References & Additional Documentation
- DESeq2 Documentation: Bioconductor DESeq2 Package
- DESeq2 Paper: Love, M.I., Huber, W., Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550
- Limma Voom Documentation: Bioconductor Limma Package
- Limma Voom Paper: Law, C.W., Chen, Y., Shi, W., Smyth, G.K. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15, R29