Skip to content

GSEA module Pipeline Specification

Pipeline Details

  • Name: GSEA module
  • Pipeline UUID: mghswveqesuckkjb7drwcu61sooe7q
  • Version: 1.0.2
  • View Pipeline:

Overview

GSEA module pipeline is designed for performing Gene Set Enrichment Analysis on a ranked list of genes (ex. the output of DESeq2 or LimmaVoom). The pipeline utilizes the fgsea package to conduct comprehensive enrichment analysis and generates both tabular results and enrichment plots for significant gene sets.

Key Use cases:

  • Differential Gene Expression Analysis: Analyze ranked gene lists from RNA-seq experiments to identify enriched biological pathways and gene sets.
  • Pathway Enrichment Analysis: Determine which biological pathways are significantly enriched in your experimental conditions.
  • Functional Annotation: Interpret gene expression results by mapping them to known gene sets and biological processes.

Features

  • MSigDB Integration: Currently supports human and mouse genomes with gene sets provided by MSigDB for comprehensive pathway analysis.
  • Fast Analysis: Utilizes the efficient fgsea R package for rapid gene set enrichment analysis.
  • Comprehensive Output: Generates both detailed results tables and enrichment plots for visualization.
  • Flexible Input: Accepts ranked gene lists from popular differential expression tools like DESeq2 and LimmaVoom.
  • Quality Visualization: Produces publication-ready enrichment plots for each significant gene set.

Input/Output Specification

Inputs

Required

Ranked Gene List

  • Description: A tab-delimited file containing at least two columns: one representing the feature (e.g., gene name) and one containing a quantification of that feature (e.g., log2FoldChange).
  • Format: Tab-delimited text file (.txt, .tsv)
  • Requirements: Feature column and quantification column names must be specified in the "Run GSEA" settings.
  • Example File Path: /input/data/ranked_genes.txt

Run GSEA Settings

  • Description: Configuration parameters specifying which columns to use for gene names and ranking metrics.
  • Format: Configuration object
  • Required Parameters: Feature column name, quantification column name

Optional Inputs

Postfix

  • Description: Optional suffix to append to output file names for better organization and identification.
  • Format: Text string
  • Example: "_treatment_vs_control"

Outputs

Reported Outputs

These outputs are available in the Report tab after Runs.

  • GSEA Results Table:
  • Description: Comprehensive table containing enrichment statistics for all analyzed gene sets including p-values, FDR, enrichment scores, and leading edge genes.
  • Format: Tab-delimited (.tsv)
  • Example File Path: /output/results/gsea_results.tsv
  • Visualization App: Built-in table viewer
  • Location: Results folder

  • Enrichment Plots:

  • Description: Individual enrichment plots for each significant gene set showing the enrichment profile and running enrichment score.
  • Format: Image files (.png)
  • Example File Path: /output/plots/enrichment_plot_[geneset_name].png
  • Visualization App: Image viewer
  • Location: Plots folder

Supporting Outputs

These outputs are generated at intermediate steps and can be useful for debugging or additional analysis.

  • Prepared Gene List:
  • Description: Processed and formatted gene list ready for GSEA analysis.
  • Format: R data object (.rds)
  • Example File Path: /intermediate/prepared_genelist.rds

  • Analysis Log:

  • Description: Detailed log file containing analysis parameters and processing information.
  • Format: Text file (.log)
  • Example File Path: /intermediate/gsea_analysis.log

Associated Processes

References & Additional Documentation