FastQC Pipeline Specification
Pipeline Details
Overview
FastQC pipeline is designed for performing quality control checks on raw sequence data from high throughput sequencing pipelines. It provides a modular set of analyses that give a quick impression of whether there are any data quality issues prior to performing further analysis.
Key Use cases:
- Quality Assessment: Rapid evaluation of raw sequencing data quality before downstream analysis.
- Data Preprocessing: Identification of potential issues in sequencing data that may affect downstream analyses.
- Multi-sample Quality Control: Comprehensive quality control reporting across multiple samples with summary visualizations.
Features
- Flexible Input Options: Supports both samplesheet CSV files and collection-based input for single-end and paired-end reads.
- Containerized Execution: Uses Biocontainers with FastQC 0.12.1 and MultiQC 1.21 for reproducible results.
- Memory Optimization: Intelligent memory allocation that adapts to available resources while respecting FastQC's memory constraints (100-10000 MB).
- Multi-threaded Processing: Parallel processing capabilities for faster execution on large datasets.
- Comprehensive Reporting: Generates both individual FastQC reports and aggregated MultiQC summary reports.
- Automatic File Handling: Smart file renaming and management to ensure consistent naming conventions.
Input/Output Specification
Inputs
Required
mate
- Description: Library layout specification indicating whether reads are single-end or paired-end.
- Format: String value ("single" or "pair")
- Example: "single" for single-end reads, "pair" for paired-end reads
Optional Inputs
reads
- Description: Collection of sequencing read files for quality control analysis.
- Format: .fastq.gz or .fq.gz files
- Example File Path: /path/to/reads/sample_R1.fastq.gz
samplesheet_reads
- Description: CSV file containing sample information and file paths for batch processing.
- Required Columns: name, file1, file2 (file2 optional for single-end)
- Format: Comma-separated values (.csv)
- Example:
name,file1,file2 control_rep1,gs://mybucket/test_R1.fq.gz,gs://mybucket/test_R2.fq.gz control_rep2,gs://mybucket/test.fq.gz,
Outputs
Reported Outputs
- MultiQC Report:
- Description: Comprehensive HTML report aggregating FastQC results across all samples
- Format: .html
- Example File Path: /output/multiqc_report.html
- Visualization App: Web browser
- Location: Report folder
Supporting Outputs
- FastQC Individual Reports:
- Description: Individual quality control reports for each sample
- Format: .html and .zip files
- Example File Path: /output/sample_fastqc.html
- Location: FastQC output folder
Associated Processes
References & Additional Documentation
- FastQC Documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- MultiQC Documentation: https://multiqc.info/
- Container Registry:
- FastQC: quay.io/biocontainers/fastqc:0.12.1--hdfd78af_0
- MultiQC: quay.io/biocontainers/multiqc:1.21--pyhdfd78af_0