Skip to content

Souporcell module Pipeline Specification

Pipeline Details

  • Name: Souporcell module
  • Pipeline UUID: f931yb5soth8epasg60vc6j5bcg8b5
  • Version: 1.0.0
  • View Pipeline:

Overview

Souporcell module pipeline is designed for clustering mixed-genotype single-cell RNA-sequencing (scRNA-seq) experiments by individual. It automates the process of demultiplexing pooled samples, identifying individual donors, and detecting doublets to ensure accurate cell assignment and reliable downstream analysis.

Key Use cases:

  • Sample Demultiplexing: Separate cells from pooled scRNA-seq experiments back to their individual donors of origin.
  • Doublet Detection: Identify and flag doublet cells that contain genetic material from multiple individuals.
  • Genotype-based Cell Clustering: Group cells based on their genetic variants to distinguish between different individuals in mixed samples.

Features

  • Comprehensive Variant Analysis: Integrates minimap2 for remapping, freebayes for variant calling, and vartrix for cell allele counting.
  • Flexible Clustering Options: Supports customizable cluster numbers and ploidy settings (1 or 2) to accommodate different experimental designs.
  • Quality Control Parameters: Implements configurable thresholds for minimal alternative and reference reads to ensure robust variant calling.
  • Doublet Detection: Includes troublet algorithm for identifying cells containing genetic material from multiple individuals.
  • Ambient RNA Inference: Estimates and accounts for ambient RNA contamination in the clustering analysis.
  • Performance Optimization: Configurable maximum loci per cell parameter to balance accuracy and computational speed.

Input/Output Specification

Inputs

Required

bamFile

  • Description: Sorted BAM file containing aligned single-cell RNA-seq reads
  • Format: .bam
  • Example File Path: /path/to/input/sorted_reads.bam

inputFileTsv

  • Description: Barcodes.tsv file as output from cellranger containing cell barcode information
  • Format: .tsv
  • Example File Path: /path/to/input/barcodes.tsv

fasta

  • Description: Reference genome FASTA file used for variant calling and genotype clustering
  • Format: .fasta or .fa
  • Example File Path: /path/to/reference/genome.fasta

run_souporcell

  • Description: Control parameter to enable or disable the souporcell analysis
  • Format: String value ("yes" or "no")

Optional Inputs

vcfFile

  • Description: VCF file containing known common variants to improve clustering accuracy
  • Format: .vcf or .vcf.gz
  • Example File Path: /path/to/variants/common_variants.vcf

Outputs

Reported Outputs

  • inputDir:
  • Description: Output directory containing all souporcell results including cluster assignments, genotype calls, and doublet predictions
  • Format: Directory containing multiple output files
  • Example File Path: /output/directory/souporcell_results/
  • Location: Output folder

Associated Processes

References & Additional Documentation