Skip to content

Convert GTF Attributes Specifications

Process Details

  • Name: convert_gtf_attributes
  • Process UUID: A1d8eyeKKVv67BVkwJLhTt9xJEVhvC
  • Process Group: Misc.

Overview

This process standardizes GTF (Gene Transfer Format) files by replacing gene_id attributes with gene_name attributes when available, ensuring consistent gene naming conventions across genomic annotations. It also performs quality control by validating that transcript IDs are not duplicated across multiple chromosomes, which could indicate annotation errors.

This process is implemented in Perl.

Key Functionality

  • Gene ID Standardization: Replaces gene_id attributes with gene_name attributes when gene_name is available in the GTF file
  • Transcript ID Validation: Checks for and flags transcript IDs that appear on multiple chromosomes, indicating potential annotation inconsistencies
  • Quality Control Filtering: Separates valid entries from problematic entries, outputting clean data for downstream analysis

Input/Output Specification

Inputs

Required Inputs

  • GTF File
    • Description: Input GTF file containing genomic annotations with gene and transcript information
    • Format: GTF

Outputs

  • GTF File
    • Description: Processed GTF file with standardized gene_id attributes and validated transcript assignments
    • Format: GTF

References & Resources

  • Tool Documentation: Contact the team for details on the custom Perl script implementation