Skip to content

QC Specifications

Process Details

  • Name: QC
  • Process UUID: e7cd9a480a8d4929b7b614281c40ac00
  • Process Group: uminator

Overview

This process performs quality control analysis on UMI-binned sequencing reads, generating comprehensive QC reports and statistics for both binned and unbinned reads. It consolidates read chunks with identical UMIs, produces quality plots using NanoPlot, and generates detailed statistics on UMI assignment efficiency.

This process is implemented in Bash, which utilizes NanoPlot for quality visualization and seqtk for sequence processing.

Key Functionality

  • Read Consolidation: Concatenates FASTQ files with identical UMIs that were split across different read chunks during processing
  • Quality Control Visualization: Generates comprehensive quality plots for both binned (UMI-assigned) and unbinned reads using NanoPlot
  • UMI Assignment Statistics: Produces detailed TSV files containing read-to-UMI assignment statistics, including read counts and identifiers for each UMI

Input/Output Specification

Inputs

Required Inputs

  • UMI Data

    • Description: UMI-processed sequencing data containing binned and unbinned reads
    • Format: UMI file type
  • Output Directory

    • Description: Directory where QC results and reports will be stored
    • Format: Directory

Outputs

  • QC Output Directory
    • Description: Directory containing quality control reports, plots, and statistics including NanoPlot visualizations and UMI assignment summary tables
    • Format: Directory

Parameters & Settings

This process runs conditionally when the QC parameter is set to "yes" in the pipeline configuration.

References & Resources

  • Tool Documentation: Contact the team for details on the QC analysis implementation
  • Related Papers: De Coster, W., D'Hert, S., Schultz, D.T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669.