GSEA module Pipeline Specification
Pipeline Details
- Name:
GSEA module - Pipeline UUID:
mghswveqesuckkjb7drwcu61sooe7q - Version:
1.0.2 - View Pipeline:
Overview
GSEA module pipeline is designed for performing Gene Set Enrichment Analysis on a ranked list of genes (ex. the output of DESeq2 or LimmaVoom). The pipeline utilizes the fgsea package to conduct comprehensive enrichment analysis and generates both tabular results and enrichment plots for significant gene sets.
Key Use cases:
- Differential Gene Expression Analysis: Analyze ranked gene lists from RNA-seq experiments to identify enriched biological pathways and gene sets.
- Pathway Enrichment Analysis: Determine which biological pathways are significantly enriched in your experimental conditions.
- Functional Annotation: Interpret gene expression results by mapping them to known gene sets and biological processes.
Features
- MSigDB Integration: Currently supports human and mouse genomes with gene sets provided by MSigDB for comprehensive pathway analysis.
- Fast Analysis: Utilizes the efficient fgsea R package for rapid gene set enrichment analysis.
- Comprehensive Output: Generates both detailed results tables and enrichment plots for visualization.
- Flexible Input: Accepts ranked gene lists from popular differential expression tools like DESeq2 and LimmaVoom.
- Quality Visualization: Produces publication-ready enrichment plots for each significant gene set.
Input/Output Specification
Inputs
Required
Ranked Gene List
- Description: A tab-delimited file containing at least two columns: one representing the feature (e.g., gene name) and one containing a quantification of that feature (e.g., log2FoldChange).
- Format: Tab-delimited text file (.txt, .tsv)
- Requirements: Feature column and quantification column names must be specified in the "Run GSEA" settings.
- Example File Path: /input/data/ranked_genes.txt
Run GSEA Settings
- Description: Configuration parameters specifying which columns to use for gene names and ranking metrics.
- Format: Configuration object
- Required Parameters: Feature column name, quantification column name
Optional Inputs
Postfix
- Description: Optional suffix to append to output file names for better organization and identification.
- Format: Text string
- Example: "_treatment_vs_control"
Outputs
Reported Outputs
These outputs are available in the Report tab after Runs.
- GSEA Results Table:
- Description: Comprehensive table containing enrichment statistics for all analyzed gene sets including p-values, FDR, enrichment scores, and leading edge genes.
- Format: Tab-delimited (.tsv)
- Example File Path: /output/results/gsea_results.tsv
- Visualization App: Built-in table viewer
-
Location: Results folder
-
Enrichment Plots:
- Description: Individual enrichment plots for each significant gene set showing the enrichment profile and running enrichment score.
- Format: Image files (.png)
- Example File Path: /output/plots/enrichment_plot_[geneset_name].png
- Visualization App: Image viewer
- Location: Plots folder
Supporting Outputs
These outputs are generated at intermediate steps and can be useful for debugging or additional analysis.
- Prepared Gene List:
- Description: Processed and formatted gene list ready for GSEA analysis.
- Format: R data object (.rds)
-
Example File Path: /intermediate/prepared_genelist.rds
-
Analysis Log:
- Description: Detailed log file containing analysis parameters and processing information.
- Format: Text file (.log)
- Example File Path: /intermediate/gsea_analysis.log
Associated Processes
References & Additional Documentation
- Related Papers/links: fgsea Bioconductor Package and fgsea Tutorial
- MSigDB Database: Molecular Signatures Database
- Supported Organisms: Currently supports human and mouse genomes