Convert GTF Attributes Specifications
Process Details
- Name:
convert_gtf_attributes - Process UUID:
A1d8eyeKKVv67BVkwJLhTt9xJEVhvC - Process Group:
Misc.
Overview
This process standardizes GTF (Gene Transfer Format) files by replacing gene_id attributes with gene_name attributes when available, ensuring consistent gene naming conventions across genomic annotations. It also performs quality control by validating that transcript IDs are not duplicated across multiple chromosomes, which could indicate annotation errors.
This process is implemented in Perl.
Key Functionality
- Gene ID Standardization: Replaces gene_id attributes with gene_name attributes when gene_name is available in the GTF file
- Transcript ID Validation: Checks for and flags transcript IDs that appear on multiple chromosomes, indicating potential annotation inconsistencies
- Quality Control Filtering: Separates valid entries from problematic entries, outputting clean data for downstream analysis
Input/Output Specification
Inputs
Required Inputs
- GTF File
- Description: Input GTF file containing genomic annotations with gene and transcript information
- Format: GTF
Outputs
- GTF File
- Description: Processed GTF file with standardized gene_id attributes and validated transcript assignments
- Format: GTF
References & Resources
- Tool Documentation: Contact the team for details on the custom Perl script implementation