Skip to content

ATAC-Seq and ChIP-Seq pipelines

image

image

Via Foundry offers comprehensive pipelines for the processing of ChIP-Seq and ATAC-Seq data, which are widely used in genomic research. Although these pipelines share many common processes, they exhibit specific differences at certain stages. Moreover, they rely on identical data preparation steps employed in the RNA-Seq pipeline, including read filtering, read quality reporting, and alignment to desired genomic locations.

The key steps involved in the ChIP-Seq and ATAC-Seq pipelines are as follows:

  1. Quality Control: The pipelines utilize FastQC to assess the quality of the sequencing reads and generate quality control outputs. Additionally, optional processes such as read quality filtering (trimmomatic), read quality trimming (trimmomatic), and adapter removal (cutadapt) can be employed to further refine the data.
  2. Counting and Filtering: To estimate the abundance of both standard and predefined sets of genomic loci (e.g., rRNAs, miRNAs, tRNAs, piRNAs, snoRNAs, ERCC), the pipelines employ tools like Bowtie2/Bowtie/STAR. These tools facilitate read counting or filtering to obtain valuable insights into the genomic regions of interest.
  3. Read Alignment: The short-read aligner Bowtie2 is employed to align the sequencing reads to a reference genome (Langmead and Salzberg 2012). In cases where the input files are large, such as those obtained from ATAC-Seq experiments, the pipeline optimizes alignment speed by splitting the files into smaller chunks and performing parallel alignments.
  4. PCR Duplicate Removal: The pipelines incorporate the Picard mark duplicates function (Broad Institute, n.d.) and Samtools (H. Li et al. 2009) to estimate and remove PCR duplicates. By employing merged alignments, the duplicate reads can be efficiently identified and eliminated, ensuring accurate downstream analysis.
  5. ATAC-Seq-specific Analysis: In the case of ATAC-Seq data, the pipeline performs additional steps. It identifies accessible chromatin regions by estimating the Tn5 transposase cut sites. This estimation involves positioning on the 9th base upstream of the 5' read end and extending by 29 bases downstream. This extension process is based on studies (Donnard et al. 2018; Buenrostro et al. 2013) that have shown it to more accurately reflect the exact positions accessible to the transposase. Subsequently, peaks are called using MACS2 (Zhang et al. 2008) in both the ChIP-Seq and ATAC-Seq pipelines.
  6. Consensus Peak Calling and Quantification: When processing multiple samples together, the ATAC-Seq and ChIP-Seq pipelines offer the option of generating consensus peak calls. This is achieved by merging all peaks individually called in each sample using Bedtools (Quinlan and Hall 2010). Furthermore, the pipelines quantify the number of reads in each peak location using Bedtools' coverage function, facilitating comprehensive analysis of the data.
  7. Data Analysis: As a result, both the ATAC-Seq and ChIP-Seq pipelines generate a matrix containing count values for each peak region and sample. This matrix can be directly uploaded to the embedded version of DEBrowser (Kucukural et al. 2019) for performing differential analysis. Alternatively, the matrix can be downloaded for further analysis using other tools or methods.