Clustering and Find Markers Specifications
Process Details
- Name:
Clustering and Find Markers - Process UUID:
hMnnV5ho8ETt7gbEQNlQOAshZjwLef - Process Group:
SingleCell
Overview
This process generates a comprehensive HTML report for single-cell RNA-seq clustering analysis using Seurat. It performs clustering analysis across multiple resolutions, identifies cluster-specific marker genes, and provides detailed visualizations and quality control metrics. The report includes sample statistics, PCA results, tSNE and UMAP visualizations, clustering analysis, cluster marker identification, and cluster quality control to provide insights into cell subpopulations, gene expression patterns, and overall data quality.
This process is implemented in Bash, which invokes a Python script for clustering analysis and marker identification using Seurat.
Key Functionality
- Multi-resolution Clustering: Performs clustering analysis across a range of resolutions to identify optimal clustering parameters
- Dimensionality Reduction Visualization: Generates tSNE and UMAP plots for visualizing cell populations and clustering results
- Marker Gene Discovery: Identifies cluster-specific marker genes to characterize cell subpopulations
- Quality Control Assessment: Provides comprehensive quality metrics and visualizations for clustering validation
- Interactive HTML Reporting: Creates a detailed HTML report with embedded visualizations and statistical summaries
Input/Output Specification
Inputs
Required Inputs
- Seurat Object
- Description: Preprocessed single-cell RNA-seq data stored as a Seurat object containing normalized expression data and metadata
- Format: RDS
Outputs
-
HTML Report
- Description: Comprehensive analysis report containing clustering visualizations, marker gene tables, and quality control metrics
- Format: HTML
-
Updated Seurat Object
- Description: Seurat object updated with clustering results, UMAP/tSNE coordinates, and identified marker genes
- Format: RDS
-
Marker Genes Table
- Description: Tab-separated file containing cluster marker genes with statistical significance metrics and expression values
- Format: TSV
Parameters & Settings
These parameters can be adjusted in the Foundry UI when running this process.
-
Minimum Resolution
- Description: Minimum resolution for clustering parameter selection
- Default value: 0.1
-
Maximum Resolution
- Description: Maximum resolution for clustering parameter selection
- Default value: 2.0
-
# of Principal Components
- Description: Number of principal components to build UMAP, tSNE and nearest neighbor graph. Enter 0 for automated prediction
- Default value: 0
-
Find Markers for All Resolutions
- Description: Whether to find cluster markers for all resolutions. This can significantly increase computational time and cost
- Default value: false
References & Resources
- Tool Documentation: Contact the team for details on
build_clustering_and_find_markers.py - Related Papers: Hao, Y., Hao, S., Andersen-Nissen, E. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021). https://doi.org/10.1016/j.cell.2021.04.048