Skip to content

ATAC-seq Pipeline Specification

Pipeline Details

  • Name: ATAC-seq Pipeline
  • Pipeline UUID: c663r5ndnj9cle5z9uud80fkexijm8
  • Version: 1.2.0
  • View Pipeline:

Overview

ATAC-seq Pipeline is designed for analyzing Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) data. It automates data preprocessing, quality control, read mapping, peak calling, and downstream analysis to identify accessible chromatin regions and generate count matrices for differential accessibility analysis.

Key Use cases:

  • Chromatin Accessibility Analysis: Identifies accessible chromatin regions using MACS2 peak calling with Tn5 transposase cut site positioning.
  • Differential Accessibility Studies: Generates count matrices for peak regions that can be analyzed with DEBrowser for differential accessibility analysis.
  • Multi-sample Peak Consensus: Creates consensus peak calls by merging peaks from multiple samples using Bedtools for comparative analysis.

Features

  • Support for Multiple Quality Control Options: Includes FastQC, Trimmomatic, and Cutadapt for comprehensive read quality assessment and filtering.
  • Specialized ATAC-seq Processing: Implements accurate Tn5 transposase cut site positioning by extending 29 bases downstream from the 9th base upstream of 5' read ends.
  • Dual Mapping Strategy: Uses Bowtie2 for both sequential mapping to filter common reads (ERCC, rmsk) and genome alignment with duplicate removal via Picard.
  • Comprehensive Peak Analysis: Employs MACS2 for peak calling and Bedtools for consensus peak generation and read quantification across samples.
  • Visualization Support: Generates IGV (TDF) and Genome Browser (BigWig) files for interactive data exploration.
  • Integrated Analysis Platform: Direct integration with DEBrowser for differential accessibility analysis.

Input/Output Specification

Inputs

Required

Reads

  • Description: FASTQ files containing ATAC-seq sequencing reads from transposase-accessible chromatin regions.
  • Format: .fastq.gz
  • Example File Path: /path/to/input/sample.fastq.gz

ATAC-prep Section

  • Description: Sample definitions required for peak calling with MACS2. Configure through the run_ATAC_MACS2 settings.
  • Required Fields: Output-Prefix, Sample-Prefix, Input-Prefix (optional)
  • Format: Tab-separated configuration
Output-Prefix Sample-Prefix Input-Prefix (optional)
exper-rep1 exper-rep1
control-rep1 control-rep1

Outputs

Reported Outputs

  • Peak Count Matrix:
  • Description: Matrix containing read counts for each peak region across all samples
  • Format: .tsv
  • Example File Path: /output/directory/peak_counts_matrix.tsv
  • Visualization App: DEBrowser
  • Location: Results folder

  • Peak Calls:

  • Description: Individual and consensus peak calls from MACS2 analysis
  • Format: .bed, .narrowPeak
  • Example File Path: /output/directory/consensus_peaks.bed
  • Visualization App: IGV, UCSC Genome Browser
  • Location: Results folder

Supporting Outputs

  • Alignment Files:
  • Description: Bowtie2 aligned reads with duplicates removed by Picard
  • Format: .bam, .bai
  • Example File Path: /intermediate/directory/sample_aligned_dedup.bam

  • Quality Control Reports:

  • Description: FastQC quality assessment reports and MultiQC summary
  • Format: .html, .zip
  • Example File Path: /intermediate/directory/sample_fastqc.html

  • Visualization Files:

  • Description: IGV TDF files and BigWig files for genome browser visualization
  • Format: .tdf, .bw
  • Example File Path: /output/directory/sample.tdf

Associated Processes

References & Additional Documentation

  • Related Papers:
  • Yukselen, O., Turkyilmaz, O., Ozturk, A.R. et al. DolphinNext: a distributed data processing platform for high throughput genomics. BMC Genomics 21, 310 (2020). https://doi.org/10.1186/s12864-020-6714-x
  • Buenrostro et al. 2013; Donnard et al. 2018 (ATAC-seq methodology)
  • Zhang et al. 2008 (MACS2); Quinlan and Hall 2010 (Bedtools)
  • Pipeline Repository: Available through DolphinNext platform
  • Workflow Diagram: Available in pipeline description pages