Skip to content

Add Custom Seq To Genome Gtf Specifications

Process Details

  • Name: Add_custom_seq_to_genome_gtf
  • Process UUID: OyfWrL0bBQbwEFcALKSiFt9Itzc4QW
  • Process Group: Misc.

Overview

This process adds custom sequences to a genome FASTA file and updates the corresponding GTF annotation file. It integrates user-provided custom sequences into existing reference genome files, creating modified versions that include both the original genome and custom sequences. The process automatically generates appropriate GTF annotations for the custom sequences and creates indexed files for downstream analysis.

This process is implemented in Bash, which invokes a Python script for FASTA processing and GTF generation using BioPython.

Key Functionality

  • Custom Sequence Integration: Appends custom FASTA sequences to the original genome file
  • GTF Annotation Generation: Automatically creates GTF entries for custom sequences with gene, transcript, and exon annotations
  • File Indexing: Generates FASTA index files and sorts/indexes GTF files for compatibility with downstream tools
  • Quality Control: Handles file format cleaning by removing carriage returns from custom sequences

Input/Output Specification

Inputs

Required Inputs

  • genome

    • Description: Reference genome FASTA file to which custom sequences will be added
    • Format: FASTA (.fa/.fasta)
  • gtfFile

    • Description: Gene annotation file corresponding to the reference genome
    • Format: GTF (.gtf)

Optional Inputs

  • custom_fasta
    • Description: FASTA file containing custom sequences to be added to the reference genome
    • Format: FASTA (.fa/.fasta)

Outputs

  • genome

    • Description: Modified genome FASTA file containing both original and custom sequences with FASTA index
    • Format: FASTA (.fa/.fasta)
  • gtfFile

    • Description: Updated GTF annotation file containing annotations for both original and custom sequences, sorted and indexed
    • Format: GTF (.gtf)

References & Resources

  • Tool Documentation: Contact the team for details on the custom Python script for GTF generation
  • Related Papers: Cock PJ, Antao T, Chang JT, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422-1423. doi:10.1093/bioinformatics/btp163