Skip to content

TE Transcripts Pipeline Specification

Pipeline Details

  • Name: TE Transcripts
  • Pipeline UUID: 7f3qsrd6bvtrpv2l08kz1didgfz04y
  • Version: 1.2.0
  • View Pipeline:

Overview

TE Transcripts pipeline is designed for analyzing RNA-seq data with a focus on transposable elements (TEs) and gene expression. It uses TEtranscripts and TEcount to annotate reads to both genes and transposable elements, followed by differential analysis using DESeq2. The pipeline automates data preprocessing, quality control, alignment, and downstream analysis to ensure reliable and reproducible results for transposable element research.

Key Use cases:

  • Transposable Element Analysis: Quantification and differential expression analysis of both genes and transposable elements from RNA-seq data.
  • Dual Annotation Workflow: Simultaneous analysis of gene expression and transposable element activity in the same samples.
  • Differential Expression Analysis: Comparative analysis between conditions using DESeq2 or Limma Voom with support for complex experimental designs.

Features

  • Specialized TE Annotation: Uses curated GTF files specifically designed for transposable element analysis with TEtranscripts and TEcount tools.
  • Flexible Alignment Options: Supports STAR aligner with customizable parameters and index building capabilities.
  • Comprehensive Quality Control: Implements FastQC analysis at multiple stages, adapter removal, quality filtering, and UMI processing.
  • Modular Preprocessing: Supports various preprocessing steps including adapter removal (Trimmomatic or fastx_clipper), quality filtering, trimming, and UMI extraction.
  • Statistical Analysis Integration: Built-in differential expression analysis using DESeq2 with support for complex experimental designs and batch correction.
  • Automated Reporting: Generates comprehensive summary reports, QC metrics, and visualization outputs for easy interpretation.
  • Scalable Processing: Containerized processes using quay.io/viascientific containers for reproducibility and scalability.

Input/Output Specification

Note: This pipeline uses internal data flow between processes. Primary inputs are typically FASTQ files and reference annotations, while outputs include count matrices and differential expression results.

Inputs

Required

RNA-seq FASTQ Files

  • Description: Raw sequencing reads from RNA-seq experiments in FASTQ format
  • Format: .fastq or .fastq.gz (compressed)
  • Example File Path: /path/to/input/sample_R1.fastq.gz

Gene Annotation GTF

  • Description: Gene annotation file in GTF format for standard gene quantification
  • Format: .gtf or .gtf.gz
  • Example File Path: /path/to/annotations/genes.gtf

Transposable Element GTF

  • Description: Curated GTF file specifically for transposable element annotation (required for TEtranscripts/TEcount)
  • Format: .gtf or .gtf.gz
  • Example File Path: /path/to/annotations/rmsk_TE.gtf

Reference Genome

  • Description: Reference genome sequence in FASTA format
  • Format: .fa, .fasta, or compressed versions
  • Example File Path: /path/to/genome/genome.fa

Optional Inputs

Sample Metadata

  • Description: Tab-separated file containing sample information for differential expression analysis
  • Required Columns: sample_name, group
  • Format: Tab-separated values (.tsv)
  • Example:

Sample_name group sample1 control sample2 treated

Comparison File

  • Description: Specifies which groups to compare in differential expression analysis
  • Required Columns: controls, treats, names
  • Format: Tab-separated values (.tsv)
  • Example:

controls treats names control treated treated_vs_control

Outputs

Reported Outputs

  • TE Count Matrix:
  • Description: Combined count matrix containing both gene and transposable element counts
  • Format: .tsv
  • Example File Path: /output/counts/TE_counts_matrix.tsv
  • Visualization App: DE Browser, R/Bioconductor
  • Location: Counts Folder

  • Differential Expression Results:

  • Description: Statistical results from DESeq2 analysis showing differentially expressed genes and TEs
  • Format: .csv, .tsv
  • Example File Path: /output/DE_analysis/treated_vs_control_DE_results.csv
  • Visualization App: DE Browser, IGV
  • Location: DE Analysis Folder

  • Quality Control Reports:

  • Description: FastQC reports and preprocessing summaries
  • Format: .html, .tsv
  • Example File Path: /output/QC/fastqc_reports.html
  • Visualization App: Web browser
  • Location: QC Folder

Supporting Outputs

  • Aligned BAM Files:
  • Description: STAR-aligned reads in BAM format with indexing
  • Format: .bam, .bai
  • Example File Path: /intermediate/alignments/sample_aligned_sorted.bam

  • Preprocessing Summaries:

  • Description: Summary statistics from adapter removal, quality filtering, and trimming steps
  • Format: .tsv, .log
  • Example File Path: /intermediate/preprocessing/adapter_removal_summary.tsv

  • STAR Index:

  • Description: Built STAR genome index for alignment
  • Format: Directory with index files
  • Example File Path: /intermediate/indices/STARIndex/

Associated Processes

References & Additional Documentation