Skip to content

FastQC Pipeline Specification

Pipeline Details

  • Name: FastQC
  • Pipeline UUID: c6637vqpbs50gz56lmdoa4fghc6swe
  • Version: 1.2.0
  • View Pipeline:

Overview

FastQC pipeline is designed for performing quality control checks on raw sequence data from high throughput sequencing pipelines. It provides a modular set of analyses that give a quick impression of whether there are any data quality issues prior to performing further analysis.

Key Use cases:

  • Quality Assessment: Rapid evaluation of raw sequencing data quality before downstream analysis.
  • Data Preprocessing: Identification of potential issues in sequencing data that may affect downstream analyses.
  • Multi-sample Quality Control: Comprehensive quality control reporting across multiple samples with summary visualizations.

Features

  • Flexible Input Options: Supports both samplesheet CSV files and collection-based input for single-end and paired-end reads.
  • Containerized Execution: Uses Biocontainers with FastQC 0.12.1 and MultiQC 1.21 for reproducible results.
  • Memory Optimization: Intelligent memory allocation that adapts to available resources while respecting FastQC's memory constraints (100-10000 MB).
  • Multi-threaded Processing: Parallel processing capabilities for faster execution on large datasets.
  • Comprehensive Reporting: Generates both individual FastQC reports and aggregated MultiQC summary reports.
  • Automatic File Handling: Smart file renaming and management to ensure consistent naming conventions.

Input/Output Specification

Inputs

Required

mate

  • Description: Library layout specification indicating whether reads are single-end or paired-end.
  • Format: String value ("single" or "pair")
  • Example: "single" for single-end reads, "pair" for paired-end reads

Optional Inputs

reads

  • Description: Collection of sequencing read files for quality control analysis.
  • Format: .fastq.gz or .fq.gz files
  • Example File Path: /path/to/reads/sample_R1.fastq.gz

samplesheet_reads

  • Description: CSV file containing sample information and file paths for batch processing.
  • Required Columns: name, file1, file2 (file2 optional for single-end)
  • Format: Comma-separated values (.csv)
  • Example:
    name,file1,file2
    control_rep1,gs://mybucket/test_R1.fq.gz,gs://mybucket/test_R2.fq.gz
    control_rep2,gs://mybucket/test.fq.gz,
    

Outputs

Reported Outputs

  • MultiQC Report:
  • Description: Comprehensive HTML report aggregating FastQC results across all samples
  • Format: .html
  • Example File Path: /output/multiqc_report.html
  • Visualization App: Web browser
  • Location: Report folder

Supporting Outputs

  • FastQC Individual Reports:
  • Description: Individual quality control reports for each sample
  • Format: .html and .zip files
  • Example File Path: /output/sample_fastqc.html
  • Location: FastQC output folder

Associated Processes

References & Additional Documentation

  • FastQC Documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  • MultiQC Documentation: https://multiqc.info/
  • Container Registry:
  • FastQC: quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0
  • MultiQC: quay.io/biocontainers/multiqc:1.21--pyhdfd78af_0