LAAVA: Long-read AAV Analysis Pipeline Specification
Pipeline Details
- Name:
LAAVA: Long-read AAV Analysis Pipeline - Pipeline UUID:
f931n4azd26mr59xjzryqfsymvqfsymvqiy5 - Version:
1.3.2 - View Pipeline:
Overview
LAAVA: Long-read AAV Analysis Pipeline is designed for automated analysis of long-read sequencing data from adeno-associated virus (AAV) products. It provides standardized nomenclature, rigorous quality control, and comprehensive reporting to ensure comparability across production runs and inform vector design and quality control decisions.
Key Use cases:
- Vector Genome Integrity Assessment: Quantify proportions of full-length, partial, and truncated ssAAV/scAAV genomes to evaluate functional payload yield.
- Contaminant Detection & Quantification: Detect and report host-cell DNA, RepCap/helper plasmid carry-over, backbone fragments, and chimeric reads.
- Flip/Flop Configuration Analysis: Determine ITR orientation ("flip" vs. "flop") distributions via local alignment (Parasail) to assess ITR integrity and QC.
Features
- Multi-Reference Alignment & Classification: Maps HiFi reads with minimap2 to assign each read a "type" (ssAAV, scAAV, host, RepCap, helper, chimeric, etc.) and "subtype" (full, left-partial, right-partial, vector+backbone, snapback) based on CIGAR patterns.
- Flip/Flop & Structural Variant Detection: Reports size distributions and variant hotspots (insertions, deletions) across the vector.
- Scalability: Can process thousands of samples in parallel using cloud-based execution and containerized workflows.
- Reproducibility: Fully containerized via Docker, ensuring consistent results across environments.
- Comprehensive QC & Integrity Checks: Automated MultiQC-style summaries integrated into final report.
- Automated Reporting: Generates HTML/PDF reports with interactive plots and supporting tabular data.
Input/Output Specification
Inputs
Required
inputs
- Description: PacBio AAV sequencing read set, as HiFi/CCS reads in FASTQ or unaligned BAM format. The PacBio instrument should be run in AAV mode.
- Format: .fastq.gz / .bam
- Example File Path: /samples/ss.subsample005.bam
vector_fasta
- Description: Vector plasmid, as a single-record FASTA.
- Format: .fasta
- Example File Path: /samples/ss.construct.fasta
vector_bed
- Description: Annotated vector construct region coordinates in 4-column UCSC BED format. This file must indicate the transgene/payload region via either two labeled Inverted Terminal Repeat (ITR) regions or, as a legacy mode, one region with the label 'vector', spanning both ITRs (inclusive).
- Format: .bed
- Example File Path: /samples/ss.construct.bed
Optional Inputs
packaging_fa
- Description: Packaging sequences -- helper and rep/cap plasmids, and other sequences of interest (e.g. Lambda), as a multi-record FASTA.
- Format: .fa / .fasta
host_fa
- Description: Host genome (recommended), as a multi-record FASTA. Best to include only the canonical chromosomes and not the alternative contigs.
- Format: .fa / .fasta
flipflop_fa
- Description: Flip/flop ITR sequences. AAV2 sequences are built in and available by default; provide custom sequences here to use another serotype.
- Format: .fa / .fasta
ITR Labels
- itr_label_1, itr_label_2, mitr_label: Case-sensitive labels that must match exactly with the vector annotation BED file. For scAAV vector constructs, the mutant ITR (mITR) should be specified with mitr_label.
Sequence IDs
- repcap_name, helper_name, lambda_name: Case-sensitive sequence IDs used in the packaging FASTA file that must match exactly.
Clair3 Parameters
- Platform: Select the sequencing platform of the input. Possible options: {ont, hifi, ilmn}.
- model_name: Selects a pre-trained model to use in variant calling (e.g., r941_prom_sup_g5014, hifi_revio, hifi_sequel2).
- opt_parameters: Optional parameters to use in Clair3.
Outputs
Reported Outputs
- html_report:
- Description: HTML report with various plots such as read-type proportions, read-length histograms, quality metrics and analysis results.
- Format: .html
- Example File Path: /make_report/ss_report.html
- Visualization App: HTML browser
-
Location: make_reports
-
pdf_report:
- Description: PDF report with various plots such as read-type proportions, read-length histograms, quality metrics and analysis results.
- Format: .pdf
- Example File Path: /make_report/ss_report.pdf
- Visualization App: PDF reader
-
Location: make_reports
-
variants:
- Description: Variants in VCF version 4.2 format.
- Format: .vcf
- Example File Path: /variants/ss.vcf.gz
- Location: Clair3
Supporting Outputs
- Aligned BAM files:
- Description: Aligned reads from minimap2 mapping to AAV, packaging, and host reference sequences.
- Format: .bam
-
Example File Path: /map_reads/aligned_reads.bam
-
BAM index files:
- Description: Index files for aligned BAM files.
- Format: .bai
-
Example File Path: /map_reads/aligned_reads.bam.bai
-
TSV summary files:
- Description: Tabular data supporting the analysis results and classifications.
- Format: .tsv
- Example File Path: /make_report/analysis_summary.tsv
Associated Processes
References & Additional Documentation
- Related Papers: Travers et al. (2010) - Circular consensus sequencing; Li (2018) - Minimap2 alignment
- Container Registry: quay.io/viascientific/laava:1.0.0, hkubal/clair3:v1.0.11
- Clair3 Documentation: GitHub Repository