CHIP-seq Pipeline Specification
Pipeline Details
- Name:
CHIP-seq Pipeline - Pipeline UUID:
16915266cd614e24a3ca183d6d86ab63 - Version:
2.0.0 - View Pipeline:
Overview
CHIP-seq Pipeline is designed for analyzing chromatin immunoprecipitation sequencing (ChIP-seq) data. It automates the complete workflow from raw reads to peak calling and quantification, providing comprehensive quality control and analysis capabilities for histone modification and transcription factor binding studies.
Key Use cases:
- Histone Modification Analysis: Mapping and quantifying histone marks across the genome to understand chromatin states and gene regulation.
- Transcription Factor Binding Site Discovery: Identifying and characterizing protein-DNA interactions through peak calling and motif analysis.
- Comparative ChIP-seq Studies: Processing multiple samples for differential binding analysis and consensus peak identification.
Features
- Comprehensive Quality Control Pipeline: Implements FastQC, adapter removal, quality filtering, and trimming with multiple tool options including Trimmomatic and Fastx toolkit.
- Flexible Read Mapping: Supports Bowtie2 alignment with sequential mapping capabilities for filtering common sequences (ERCC, RepeatMasker).
- Advanced Peak Calling: Utilizes MACS3 v3.0.1 for sensitive and specific ChIP peak detection with support for input controls.
- Duplicate Removal Options: Provides both Picard and Samtools-based duplicate removal strategies.
- Consensus Peak Analysis: Merges peaks across multiple samples using Bedtools for robust peak identification and quantification.
- UMI Support: Includes UMI extraction and deduplication capabilities for enhanced data quality.
- Comprehensive Reporting: Generates detailed QC metrics, alignment statistics, and visualization-ready files (TDF, BigWig).
- Scalable Architecture: Processes multiple samples in parallel with modular design for customization.
Input/Output Specification
Inputs
Required
Reads
- Description: Raw sequencing reads in FASTQ format from ChIP-seq experiments
- Format: .fastq.gz
- Example File Path: /path/to/input/sample_R1.fastq.gz
ChIP-prep Section
- Description: Sample definitions for peak calling with MACS3
- Required Fields: Output-Prefix, Sample-Prefix, Input-Prefix (optional)
- Format: Tab-separated configuration
- Example: | Output-Prefix | Sample-Prefix | Input-Prefix | |---------------|---------------|--------------| | exper-rep1 | exper-rep1 | | | control-rep1 | control-rep1 | |
Outputs
Reported Outputs
- Peak Count Matrix:
- Description: Quantified read counts for each peak region across all samples
- Format: .tsv
- Example File Path: /output/results/peak_counts_matrix.tsv
- Visualization App: DEBrowser
-
Location: Results folder
-
Individual Peak Files:
- Description: MACS3 peak calls for each sample
- Format: .narrowPeak, .xls
- Example File Path: /output/peaks/sample_peaks.narrowPeak
- Visualization App: IGV, UCSC Genome Browser
- Location: Peaks folder
Supporting Outputs
- Alignment Files:
- Description: Deduplicated BAM files with alignment indices
- Format: .bam, .bai
-
Example File Path: /output/alignments/sample_dedup.bam
-
Quality Control Reports:
- Description: FastQC reports, alignment statistics, and Picard metrics
- Format: .html, .pdf, .tsv
-
Example File Path: /output/qc/sample_fastqc.html
-
Genome Browser Files:
- Description: Signal tracks for visualization
- Format: .tdf, .bigwig
- Example File Path: /output/tracks/sample_signal.tdf
- Visualization App: IGV, UCSC Genome Browser
Associated Processes
- Add custom seq to genome gtf
- Adapter Removal
- Adapter Removal Summary
- ATAC CHIP summary
- bed merge
- bedtools coverage
- Bowtie Summary
- Check BED12
- Check Build Bowtie2 Index
- check Bowtie2 files
- check files
- Check Genome GTF
- Check chrom sizes and index
- Check Sequential Mapping Indexes
- ChIP ATAC seq Reporting
- ChIP MACS
- ChIP Prep
- convert gtf attributes
- Deduplication Summary
- deepTools CreateMatrix
- deepTools PlotHeatmap
- Download build sequential mapping indexes
- FastQC
- FastQC after Adapter Removal
- HOMER Annotate Peaks
- HOMER Motif Finder
- IGV BAM2TDF converter
- Map Bowtie2
- Merge Bam
- Merge TSV Files
- MultiQC
- Overall Summary
- Picard
- Picard MarkDuplicates
- Picard Summary
- Quality Filtering
- Quality Filtering Summary
- Remove Multimappers with Samtools
- RSeQC
- Sequential Mapping
- Sequential Mapping Bam count
- Sequential Mapping Summary
- Trimmer
- Trimmer Summary
- UCSC BAM2BigWig converter
- UMIextract
- Umitools Summary
References & Additional Documentation
- Related Papers:
- Yukselen, O., Turkyilmaz, O., Ozturk, A.R. et al. DolphinNext: a distributed data processing platform for high throughput genomics. BMC Genomics 21, 310 (2020). https://doi.org/10.1186/s12864-020-6714-x
- DOI: 10.1186/s12864-020-6714-x
- Program Versions: MACS3 v3.0.1, Bowtie2 v2.3.5, FastQC v0.11.8, Picard v2.18.27, Samtools v1.3, MultiQC v1.7, Trimmomatic v0.39, Bedtools v2.29.2