Skip to content

Souporcell Specifications

Process Details

  • Name: Souporcell
  • Process UUID: f9311fxghmtmx5apx7gfdds0gvgbxr
  • Process Group: SingleCell

Overview

Souporcell is a specialized tool designed to cluster mixed-genotype single-cell RNA sequencing experiments by individual. This process enables researchers to demultiplex pooled single-cell samples by identifying which cells belong to which individual donor, making it particularly valuable for studies involving multiple samples or individuals processed together in a single scRNA-seq experiment.

This process is implemented in Bash, which invokes the Souporcell Python pipeline for genotype-based cell clustering and demultiplexing.

Key Functionality

  • Genotype-based Clustering: Groups cells by individual based on genetic variants detected in the scRNA-seq data
  • Cell Demultiplexing: Separates pooled single-cell samples into individual donor contributions
  • Variant Analysis: Analyzes genetic variants across cells to determine sample identity and detect doublets

Input/Output Specification

Inputs

Required Inputs

  • BAM File Set

    • Description: Aligned single-cell RNA sequencing reads containing cellular barcodes
    • Format: BAM
  • Input File TSV

    • Description: Tab-separated file containing cell barcodes and associated metadata
    • Format: TSV
  • Reference FASTA

    • Description: Reference genome sequence used for variant calling and genotyping
    • Format: FASTA
  • Run Souporcell

    • Description: Control parameter to execute the Souporcell analysis
    • Format: run_souporcell

Optional Inputs

  • VCF File
    • Description: Variant call format file containing known common variants to guide the clustering process
    • Format: VCF

Outputs

  • Input Directory
    • Description: Output directory containing Souporcell results including cluster assignments, genotype calls, and quality metrics
    • Format: Directory

Parameters & Settings

These parameters can be adjusted in the Foundry UI when running this process.

  • Clusters

    • Description: Number of clusters.
    • Default value: 4
  • Ploidy

    • Description: Ploidy. Must be 1 or 2 (default).
    • Available options: 1, 2 (default)
  • Minimal Alternative

    • Description: Minimal alternative to use locus (default=10).
    • Default value: 10
  • Minimal Reference

    • Description: Minimal reference to use locus (default=10).
    • Default value: 10
  • Max Loci

    • Description: Maxium number of loci per cell. Affects speed. (default=2048).
    • Default value: 2048

References & Resources

  • Tool Documentation: Contact the team for details on souporcell_pipeline.py
  • Related Papers: Heaton, H. et al. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nature Methods 17, 615–620 (2020). https://doi.org/10.1038/s41592-020-0820-1