Souporcell module Pipeline Specification
Pipeline Details
- Name:
Souporcell module - Pipeline UUID:
f931yb5soth8epasg60vc6j5bcg8b5 - Version:
1.0.0 - View Pipeline:
Overview
Souporcell module pipeline is designed for clustering mixed-genotype single-cell RNA-sequencing (scRNA-seq) experiments by individual. It automates the process of demultiplexing pooled samples, identifying individual donors, and detecting doublets to ensure accurate cell assignment and reliable downstream analysis.
Key Use cases:
- Sample Demultiplexing: Separate cells from pooled scRNA-seq experiments back to their individual donors of origin.
- Doublet Detection: Identify and flag doublet cells that contain genetic material from multiple individuals.
- Genotype-based Cell Clustering: Group cells based on their genetic variants to distinguish between different individuals in mixed samples.
Features
- Comprehensive Variant Analysis: Integrates minimap2 for remapping, freebayes for variant calling, and vartrix for cell allele counting.
- Flexible Clustering Options: Supports customizable cluster numbers and ploidy settings (1 or 2) to accommodate different experimental designs.
- Quality Control Parameters: Implements configurable thresholds for minimal alternative and reference reads to ensure robust variant calling.
- Doublet Detection: Includes troublet algorithm for identifying cells containing genetic material from multiple individuals.
- Ambient RNA Inference: Estimates and accounts for ambient RNA contamination in the clustering analysis.
- Performance Optimization: Configurable maximum loci per cell parameter to balance accuracy and computational speed.
Input/Output Specification
Inputs
Required
bamFile
- Description: Sorted BAM file containing aligned single-cell RNA-seq reads
- Format: .bam
- Example File Path: /path/to/input/sorted_reads.bam
inputFileTsv
- Description: Barcodes.tsv file as output from cellranger containing cell barcode information
- Format: .tsv
- Example File Path: /path/to/input/barcodes.tsv
fasta
- Description: Reference genome FASTA file used for variant calling and genotype clustering
- Format: .fasta or .fa
- Example File Path: /path/to/reference/genome.fasta
run_souporcell
- Description: Control parameter to enable or disable the souporcell analysis
- Format: String value ("yes" or "no")
Optional Inputs
vcfFile
- Description: VCF file containing known common variants to improve clustering accuracy
- Format: .vcf or .vcf.gz
- Example File Path: /path/to/variants/common_variants.vcf
Outputs
Reported Outputs
- inputDir:
- Description: Output directory containing all souporcell results including cluster assignments, genotype calls, and doublet predictions
- Format: Directory containing multiple output files
- Example File Path: /output/directory/souporcell_results/
- Location: Output folder
Associated Processes
References & Additional Documentation
- Related Papers/links: Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes
- Pipeline Repository: Souporcell GitHub Repository