Skip to content

Subset Seurat Object Specifications

Process Details

  • Name: subset_seurat_object
  • Process UUID: f931ygcuuwy5kmzn4xag9tvfuu52sd
  • Process Group: SingleCell

Overview

This process subsets a Seurat RDS object based on user-specified metadata column values and criteria. It allows researchers to extract specific cell populations or samples from their single-cell RNA sequencing datasets while maintaining the integrity of the Seurat object structure. The process can optionally rerun downstream analysis steps including normalization, variable feature detection, and data scaling on the subsetted data.

This process is implemented in Bash, which invokes a Python script for Seurat object subsetting and analysis.

Key Functionality

  • Metadata-based Subsetting: Filters cells based on specified metadata column values to create focused datasets
  • Optional Reanalysis Pipeline: Reruns normalization, variable feature detection, and data scaling on subsetted data
  • Cluster Marker Analysis: Identifies differentially expressed genes between clusters with configurable parameters
  • Visualization Generation: Creates feature plots for user-specified genes and generates comprehensive HTML reports

Input/Output Specification

Inputs

Required Inputs

  • Seurat RDS File
    • Description: A Seurat object stored as an RDS file containing single-cell RNA-seq data with associated metadata
    • Format: RDS

Outputs

  • Subsetted RDS File

    • Description: The filtered Seurat object containing only cells that match the specified subsetting criteria
    • Format: RDS
  • Analysis Report

    • Description: Comprehensive HTML report containing visualizations, quality metrics, and analysis results for the subsetted data
    • Format: HTML

Parameters & Settings

These parameters can be adjusted in the Foundry UI when running this process.

  • Metadata Filter Column

    • Description: Column from input RDS metadata from which to subset values
    • Default value: (empty - user must specify)
  • Filter Value

    • Description: Values to keep from corresponding metadata_filter_column
    • Default value: (empty - user must specify)
  • Rerun Normalization

    • Description: Reruns normalization after subsetting data. This will force the rerunning of finding variable features and scaling data
    • Default value: true (checked)
  • Normalization Method

    • Description: Normalization method to use
    • Available options: LogNormalize (default), CLR, RC, SCT
  • Rerun Find Variable Features

    • Description: Reruns finding variable features after subsetting data. This will force the rerunning of scaling data
    • Default value: true (checked)
  • Rerun Scale Data

    • Description: Reruns scaling after subsetting data
    • Default value: true (checked)
  • Number of Principal Components

    • Description: Number of principal components to build UMAP, tSNE and nearest neighbor graph. Default value is 25, enter 0 for automated prediction
    • Default value: 25
  • Only include positive cluster markers

    • Description: When checked, while searching for cluster markers only those with positive fold changes will be reported
    • Default value: false (unchecked)
  • Featured Genes (optional)

    • Description: Feature plots will be created for these genes
    • Default value: (empty)

References & Resources

  • Tool Documentation: Contact the team for details on build_subcluster_report.py
  • Related Papers: Butler, A., Hoffman, P., Smibert, P., Papalexi, E., & Satija, R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5), 411-420. https://doi.org/10.1038/nbt.4096