Skip to content

Consensus Polishing Specifications

Process Details

  • Name: consensusPolishing
  • Process UUID: a2d4ba3bd47b477e836493a49f1a0fda
  • Process Group: uminator

Overview

The Consensus Polishing process refines draft consensus sequences generated from UMI (Unique Molecular Identifier) data by applying advanced polishing algorithms. This process is designed to improve the accuracy of consensus sequences by correcting errors introduced during initial consensus calling, particularly important for high-accuracy sequencing applications where single nucleotide precision is critical.

This process is implemented in Groovy, which invokes an R script for consensus sequence polishing using Racon and Medaka tools.

Key Functionality

  • Conditional Processing: Executes consensus polishing based on user configuration or skips polishing while maintaining file structure consistency
  • Multi-tool Polishing Pipeline: Integrates Racon and Medaka polishing algorithms for comprehensive error correction
  • UMI-based Organization: Processes consensus sequences organized by UMI identifiers with proper directory structure management
  • Quality Control Integration: Incorporates configurable parameters for target read depth and polishing speed optimization

Input/Output Specification

Inputs

Required Inputs

  • UMI Data Set

    • Description: Collection of UMI-organized sequencing data containing draft consensus sequences and read assignments
    • Format: UMI file set
  • Output Directory

    • Description: Target directory for storing polished consensus sequences and intermediate files
    • Format: Directory

Outputs

  • Polished UMI Data Set

    • Description: UMI-organized data set containing polished consensus sequences with improved accuracy
    • Format: UMI file set
  • Output Directory

    • Description: Directory containing all polished consensus sequences, organized by sample and UMI identifiers
    • Format: Directory

Parameters & Settings

These parameters can be adjusted in the Foundry UI when running this process.

  • Consensus Polishing

    • Description: Enable or disable consensus sequence polishing
    • Available options: yes, no
  • Target Reads Polishing

    • Description: Target number of reads to use for consensus polishing
    • Default value: Configurable based on workflow requirements
  • Fast Polishing Flag

    • Description: Enable fast polishing mode for reduced processing time
    • Default value: Configurable boolean setting
  • Medaka Model

    • Description: Medaka model to use for consensus polishing
    • Default value: Model selection based on sequencing platform

References & Resources

  • Tool Documentation: Contact the team for details on Polish_consensus.R
  • Related Papers:
  • Racon: Vaser, R., Sović, I., Nagarajan, N., & Šikić, M. (2017). Fast and accurate de novo genome assembly from long uncorrected reads. Genome Research, 27(5), 737-746.
  • Medaka: Oxford Nanopore Technologies. Medaka: Sequence correction provided by ONT Research. https://github.com/nanoporetech/medaka