Skip to content

LAAVA: Long-read AAV Analysis Pipeline Specification

Pipeline Details

  • Name: LAAVA: Long-read AAV Analysis Pipeline
  • Pipeline UUID: f931n4azd26mr59xjzryqfsymvqfsymvqiy5
  • Version: 1.3.2
  • View Pipeline:

Overview

LAAVA: Long-read AAV Analysis Pipeline is designed for automated analysis of long-read sequencing data from adeno-associated virus (AAV) products. It provides standardized nomenclature, rigorous quality control, and comprehensive reporting to ensure comparability across production runs and inform vector design and quality control decisions.

Key Use cases:

  • Vector Genome Integrity Assessment: Quantify proportions of full-length, partial, and truncated ssAAV/scAAV genomes to evaluate functional payload yield.
  • Contaminant Detection & Quantification: Detect and report host-cell DNA, RepCap/helper plasmid carry-over, backbone fragments, and chimeric reads.
  • Flip/Flop Configuration Analysis: Determine ITR orientation ("flip" vs. "flop") distributions via local alignment (Parasail) to assess ITR integrity and QC.

Features

  • Multi-Reference Alignment & Classification: Maps HiFi reads with minimap2 to assign each read a "type" (ssAAV, scAAV, host, RepCap, helper, chimeric, etc.) and "subtype" (full, left-partial, right-partial, vector+backbone, snapback) based on CIGAR patterns.
  • Flip/Flop & Structural Variant Detection: Reports size distributions and variant hotspots (insertions, deletions) across the vector.
  • Scalability: Can process thousands of samples in parallel using cloud-based execution and containerized workflows.
  • Reproducibility: Fully containerized via Docker, ensuring consistent results across environments.
  • Comprehensive QC & Integrity Checks: Automated MultiQC-style summaries integrated into final report.
  • Automated Reporting: Generates HTML/PDF reports with interactive plots and supporting tabular data.

Input/Output Specification

Inputs

Required

inputs

  • Description: PacBio AAV sequencing read set, as HiFi/CCS reads in FASTQ or unaligned BAM format. The PacBio instrument should be run in AAV mode.
  • Format: .fastq.gz / .bam
  • Example File Path: /samples/ss.subsample005.bam

vector_fasta

  • Description: Vector plasmid, as a single-record FASTA.
  • Format: .fasta
  • Example File Path: /samples/ss.construct.fasta

vector_bed

  • Description: Annotated vector construct region coordinates in 4-column UCSC BED format. This file must indicate the transgene/payload region via either two labeled Inverted Terminal Repeat (ITR) regions or, as a legacy mode, one region with the label 'vector', spanning both ITRs (inclusive).
  • Format: .bed
  • Example File Path: /samples/ss.construct.bed

Optional Inputs

packaging_fa

  • Description: Packaging sequences -- helper and rep/cap plasmids, and other sequences of interest (e.g. Lambda), as a multi-record FASTA.
  • Format: .fa / .fasta

host_fa

  • Description: Host genome (recommended), as a multi-record FASTA. Best to include only the canonical chromosomes and not the alternative contigs.
  • Format: .fa / .fasta

flipflop_fa

  • Description: Flip/flop ITR sequences. AAV2 sequences are built in and available by default; provide custom sequences here to use another serotype.
  • Format: .fa / .fasta

ITR Labels

  • itr_label_1, itr_label_2, mitr_label: Case-sensitive labels that must match exactly with the vector annotation BED file. For scAAV vector constructs, the mutant ITR (mITR) should be specified with mitr_label.

Sequence IDs

  • repcap_name, helper_name, lambda_name: Case-sensitive sequence IDs used in the packaging FASTA file that must match exactly.

Clair3 Parameters

  • Platform: Select the sequencing platform of the input. Possible options: {ont, hifi, ilmn}.
  • model_name: Selects a pre-trained model to use in variant calling (e.g., r941_prom_sup_g5014, hifi_revio, hifi_sequel2).
  • opt_parameters: Optional parameters to use in Clair3.

Outputs

Reported Outputs

  • html_report:
  • Description: HTML report with various plots such as read-type proportions, read-length histograms, quality metrics and analysis results.
  • Format: .html
  • Example File Path: /make_report/ss_report.html
  • Visualization App: HTML browser
  • Location: make_reports

  • pdf_report:

  • Description: PDF report with various plots such as read-type proportions, read-length histograms, quality metrics and analysis results.
  • Format: .pdf
  • Example File Path: /make_report/ss_report.pdf
  • Visualization App: PDF reader
  • Location: make_reports

  • variants:

  • Description: Variants in VCF version 4.2 format.
  • Format: .vcf
  • Example File Path: /variants/ss.vcf.gz
  • Location: Clair3

Supporting Outputs

  • Aligned BAM files:
  • Description: Aligned reads from minimap2 mapping to AAV, packaging, and host reference sequences.
  • Format: .bam
  • Example File Path: /map_reads/aligned_reads.bam

  • BAM index files:

  • Description: Index files for aligned BAM files.
  • Format: .bai
  • Example File Path: /map_reads/aligned_reads.bam.bai

  • TSV summary files:

  • Description: Tabular data supporting the analysis results and classifications.
  • Format: .tsv
  • Example File Path: /make_report/analysis_summary.tsv

Associated Processes

References & Additional Documentation

  • Related Papers: Travers et al. (2010) - Circular consensus sequencing; Li (2018) - Minimap2 alignment
  • Container Registry: quay.io/viascientific/laava:1.0.0, hkubal/clair3:v1.0.11
  • Clair3 Documentation: GitHub Repository