Skip to content

IRFinder Module Pipeline Specification

Pipeline Details

  • Name: IRFinder Module
  • Pipeline UUID: f931rejcleppv2hit2lo373zwywpyp
  • Version: 1.0.0
  • View Pipeline:

Overview

IRFinder Module pipeline is designed for performing differential intron retention analysis on RNA-sequencing data using DESeq2. It automates the process of building reference databases, quantifying intron retention from BAM files, and conducting statistical analysis to identify differentially retained introns between experimental conditions.

Key Use cases:

  • Differential Intron Retention Analysis: Identify introns with significantly different retention levels between treatment and control groups from RNA-seq data.
  • Alternative Splicing Research: Study intron retention as a mechanism of post-transcriptional gene regulation in various biological contexts.
  • Disease Biomarker Discovery: Detect intron retention events that may serve as biomarkers for specific diseases or conditions.

Features

  • Comprehensive Reference Building: Automatically constructs IRFinder reference databases from genome FASTA and GTF annotation files.
  • BAM File Processing: Directly processes aligned BAM files to quantify intron retention levels without requiring re-alignment.
  • Statistical Analysis Integration: Incorporates DESeq2 for robust differential analysis with proper handling of biological replicates.
  • Flexible Experimental Design: Supports complex experimental designs through customizable groups and comparison files.
  • Containerized Workflow: Uses Docker containers (quay.io/viascientific/irfinder:1.0.0) ensuring reproducible results across different computing environments.
  • Automated File Organization: Intelligently organizes input files and prepares data structures for downstream analysis.

Input/Output Specification

Inputs

Required

BAM Files

  • Description: Aligned RNA-seq BAM files containing mapped reads for each sample
  • Format: .bam
  • Example File Path: /path/to/sample1.bam

Genome FASTA File

  • Description: Reference genome sequence in FASTA format used for building IRFinder reference
  • Format: .fa or .fasta
  • Example File Path: /path/to/genome.fa

GTF Annotation File

  • Description: Gene annotation file containing transcript and exon coordinates
  • Format: .gtf
  • Example File Path: /path/to/transcripts.gtf

Groups File

  • Description: Sample metadata file specifying experimental groups and conditions
  • Required Columns: sample_name, group
  • Format: Tab-separated values (.tsv) or comma-separated values (.csv)
  • Example:
    sample_name group
    Sample_01   control
    Sample_02   control
    Sample_03   treatment
    Sample_04   treatment
    

Comparisons File

  • Description: Specification file defining which groups to compare in differential analysis
  • Required Columns: controls, treats, names
  • Format: Tab-separated values (.tsv) or comma-separated values (.csv)
  • Example:
    controls    treats  names
    control treatment   treatment_vs_control
    

Outputs

Reported Outputs

  • Differential Intron Retention Results:
  • Description: Statistical analysis results identifying significantly differentially retained introns
  • Format: Text files with IRFinder Diff output
  • Example File Path: /output/comparison_diff/
  • Location: Comparison-specific folders

Supporting Outputs

  • IRFinder Reference Database:
  • Description: Built reference database containing genome and annotation information for IRFinder analysis
  • Format: IRFinder reference directory structure
  • Example File Path: /intermediate/IRFinder_Reference/

  • Individual Sample Quantification:

  • Description: Intron retention quantification results for each sample
  • Format: .txt files (IRFinder-IR-nondir.txt format)
  • Example File Path: /intermediate/sample_IR.txt

Associated Processes

References & Additional Documentation