Souporcell Specifications
Process Details
- Name:
Souporcell - Process UUID:
f9311fxghmtmx5apx7gfdds0gvgbxr - Process Group:
SingleCell
Overview
Souporcell is a specialized tool designed to cluster mixed-genotype single-cell RNA sequencing experiments by individual. This process enables researchers to demultiplex pooled single-cell samples by identifying which cells belong to which individual donor, making it particularly valuable for studies involving multiple samples or individuals processed together in a single scRNA-seq experiment.
This process is implemented in Bash, which invokes the Souporcell Python pipeline for genotype-based cell clustering and demultiplexing.
Key Functionality
- Genotype-based Clustering: Groups cells by individual based on genetic variants detected in the scRNA-seq data
- Cell Demultiplexing: Separates pooled single-cell samples into individual donor contributions
- Variant Analysis: Analyzes genetic variants across cells to determine sample identity and detect doublets
Input/Output Specification
Inputs
Required Inputs
-
BAM File Set
- Description: Aligned single-cell RNA sequencing reads containing cellular barcodes
- Format: BAM
-
Input File TSV
- Description: Tab-separated file containing cell barcodes and associated metadata
- Format: TSV
-
Reference FASTA
- Description: Reference genome sequence used for variant calling and genotyping
- Format: FASTA
-
Run Souporcell
- Description: Control parameter to execute the Souporcell analysis
- Format: run_souporcell
Optional Inputs
- VCF File
- Description: Variant call format file containing known common variants to guide the clustering process
- Format: VCF
Outputs
- Input Directory
- Description: Output directory containing Souporcell results including cluster assignments, genotype calls, and quality metrics
- Format: Directory
Parameters & Settings
These parameters can be adjusted in the Foundry UI when running this process.
-
Clusters
- Description: Number of clusters.
- Default value: 4
-
Ploidy
- Description: Ploidy. Must be 1 or 2 (default).
- Available options: 1, 2 (default)
-
Minimal Alternative
- Description: Minimal alternative to use locus (default=10).
- Default value: 10
-
Minimal Reference
- Description: Minimal reference to use locus (default=10).
- Default value: 10
-
Max Loci
- Description: Maxium number of loci per cell. Affects speed. (default=2048).
- Default value: 2048
References & Resources
- Tool Documentation: Contact the team for details on
souporcell_pipeline.py - Related Papers: Heaton, H. et al. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nature Methods 17, 615–620 (2020). https://doi.org/10.1038/s41592-020-0820-1