IRFinder Module Pipeline Specification
Pipeline Details
- Name:
IRFinder Module - Pipeline UUID:
f931rejcleppv2hit2lo373zwywpyp - Version:
1.0.0 - View Pipeline:
Overview
IRFinder Module pipeline is designed for performing differential intron retention analysis on RNA-sequencing data using DESeq2. It automates the process of building reference databases, quantifying intron retention from BAM files, and conducting statistical analysis to identify differentially retained introns between experimental conditions.
Key Use cases:
- Differential Intron Retention Analysis: Identify introns with significantly different retention levels between treatment and control groups from RNA-seq data.
- Alternative Splicing Research: Study intron retention as a mechanism of post-transcriptional gene regulation in various biological contexts.
- Disease Biomarker Discovery: Detect intron retention events that may serve as biomarkers for specific diseases or conditions.
Features
- Comprehensive Reference Building: Automatically constructs IRFinder reference databases from genome FASTA and GTF annotation files.
- BAM File Processing: Directly processes aligned BAM files to quantify intron retention levels without requiring re-alignment.
- Statistical Analysis Integration: Incorporates DESeq2 for robust differential analysis with proper handling of biological replicates.
- Flexible Experimental Design: Supports complex experimental designs through customizable groups and comparison files.
- Containerized Workflow: Uses Docker containers (quay.io/viascientific/irfinder:1.0.0) ensuring reproducible results across different computing environments.
- Automated File Organization: Intelligently organizes input files and prepares data structures for downstream analysis.
Input/Output Specification
Inputs
Required
BAM Files
- Description: Aligned RNA-seq BAM files containing mapped reads for each sample
- Format: .bam
- Example File Path: /path/to/sample1.bam
Genome FASTA File
- Description: Reference genome sequence in FASTA format used for building IRFinder reference
- Format: .fa or .fasta
- Example File Path: /path/to/genome.fa
GTF Annotation File
- Description: Gene annotation file containing transcript and exon coordinates
- Format: .gtf
- Example File Path: /path/to/transcripts.gtf
Groups File
- Description: Sample metadata file specifying experimental groups and conditions
- Required Columns: sample_name, group
- Format: Tab-separated values (.tsv) or comma-separated values (.csv)
- Example:
sample_name group Sample_01 control Sample_02 control Sample_03 treatment Sample_04 treatment
Comparisons File
- Description: Specification file defining which groups to compare in differential analysis
- Required Columns: controls, treats, names
- Format: Tab-separated values (.tsv) or comma-separated values (.csv)
- Example:
controls treats names control treatment treatment_vs_control
Outputs
Reported Outputs
- Differential Intron Retention Results:
- Description: Statistical analysis results identifying significantly differentially retained introns
- Format: Text files with IRFinder Diff output
- Example File Path: /output/comparison_diff/
- Location: Comparison-specific folders
Supporting Outputs
- IRFinder Reference Database:
- Description: Built reference database containing genome and annotation information for IRFinder analysis
- Format: IRFinder reference directory structure
-
Example File Path: /intermediate/IRFinder_Reference/
-
Individual Sample Quantification:
- Description: Intron retention quantification results for each sample
- Format: .txt files (IRFinder-IR-nondir.txt format)
- Example File Path: /intermediate/sample_IR.txt
Associated Processes
References & Additional Documentation
- Related Papers/links: IRFinder: assessing the impact of intron retention on mammalian gene expression
- Pipeline Repository: IRFinder GitHub Repository
- Container Registry: quay.io/viascientific/irfinder:1.0.0