Skip to content

Cellranger Mkref Specifications

Process Details

  • Name: cellranger_mkref
  • Process UUID: 6NUiKl3NanuHv37YPhEQ6BEcrDOLFZ
  • Process Group: SingleCell

Overview

This process creates a Cell Ranger reference genome using a provided genome FASTA file and GTF annotation file. The process intelligently handles different GTF file types, applying specialized filtering for GENCODE annotations to optimize single-cell RNA sequencing analysis. For GENCODE GTF files, it filters based on biotypes, removes readthrough transcripts, and excludes pseudoautosomal regions. The filtered GTF file is then used with Cell Ranger's mkref command to generate a reference genome directory suitable for downstream single-cell analysis.

This process is implemented in Bash, which invokes Cell Ranger tools for reference genome generation and GTF filtering.

Key Functionality

  • GTF Type Detection and Filtering: Automatically detects GENCODE GTF files and applies specialized filtering to retain only protein-coding genes, long non-coding RNAs, and immunoglobulin/T-cell receptor genes while excluding problematic transcript types
  • Reference Genome Generation: Uses Cell Ranger's mkref command to create an indexed reference genome directory from the input FASTA and filtered GTF files
  • Biotype-Based Gene Selection: For non-GENCODE GTF files, applies standard protein-coding gene filtering using Cell Ranger's built-in attribute filtering

Input/Output Specification

Inputs

Required Inputs

  • Genome

    • Description: Reference genome sequence in FASTA format
    • Format: FASTA (.fasta, .fa)
  • GTF File

    • Description: Gene annotation file containing transcript and gene feature information
    • Format: GTF (.gtf)

Outputs

  • Reference
    • Description: Complete Cell Ranger reference genome directory containing indexed genome sequences, gene annotations, and metadata files required for single-cell RNA-seq analysis
    • Format: Directory

Parameters & Settings

These parameters can be adjusted in the Foundry UI when running this process.

  • Optional Mkgtf Filtering Parameters
    • Description: By default for Ensembl and NCBI gtf files --attribute=gene_biotype:protein_coding will be used. For Gencode gtf, a custom script will be used for filtering. If you want to add additional attributes for filtering, you can use this field.
    • Default value: (empty)

References & Resources

  • Tool Documentation: Contact the team for details on Cell Ranger mkref and mkgtf commands
  • Related Papers: Zheng, G.X.Y., Terry, J.M., Belgrader, P. et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 8, 14049 (2017). https://doi.org/10.1038/ncomms14049