Skip to content
Learning Outcomes
  • Using Foundry pipelines
  • Creating new pipelines.
  • Use the Pipeline builder.
  • Familiarize yourself with Nextflow and common analysis packages

Building Pipelines

Via Foundry is an easy-to-use platform for creating, deploying, and executing complex nextflow pipelines for high throughput data processing.

  1. A drag-and-drop user interface to build nextflow pipelines
  2. Reproducible pipelines with version tracking
  3. Seamless portability to different computing environments with containerization
  4. Simplified pipeline sharing using GitHub (github.com)
  5. Support for continuous integration and tests (travis-ci.org)
  6. Easy re-execution of pipelines by copying previous runs settings
  7. Integrated data analysis and reporting interface with R markdown support

Our aim is;

  1. Reusability
  2. Reproducibility
  3. Shareability
  4. Easy execution
  5. Easy monitoring
  6. Easy reporting

Before you start

Please go to Via Foundry and log-in to your account. If you have issues with logging-in, please contact support@viascientific.com and we will provide assistance.

Part 1: Creating a New Pipeline

  1. To access pipeline builder page, click Pipelines tab and then click Create Pipeline button.

    Image showing create pipeline

  2. In the Create New Pipeline pop-up, assign the new pipeline to a menu group, give the pipeline a name, and optionally provide a description. When finished, click Create

    Image showing create pipeline pop-up

You can provide additional details about the pipeline in the Description tab, start developing your pipeline using Workflow tab, add additional parameters and attach extra files using the Advanced tab, and configure other settings in the Settings tab. Let's get into some details about the pipeline elements.

Image showing pipeline page

Part 2: Creating Processes

Process Overview

A "Process" is a basic programming element in Nextflow to run user scripts. Please click here to learn more about Nextflow's processes.

A process usually has inputs, outputs and script sections. In this tutorial, you will see sections that include necessary information to define a process shown in the top left of the picture below. Please, use that information to fill "Add new process" form shown in the bottom right in the picture below. Foundry will then convert this information to a nextflow process shown in the bottom left of the picture. Once a process created, it can be used in the pipeline builder. The example how it looks is shown in the top right side in the picture. The mapping between the sections shown in colored rectangles.

Image showing process overview

Processes created in this exercise

  1. FastQC process

  2. Hisat2 process

  3. RSeQC process

1. FastQC process

Navigate to the Workflow tab. You’ll notice several buttons at the left menu. New processes are created by clicking New process button Image showing create process icon.

Image showing workflow tab

a. Click the New process button Image showing create process icon in the left menu to open the "Create New Process" window.

b. On the "Overview tab, enter FastQC for the process name and define a new "Menu Group" (eg. Developer Tutorial).

Image showing FastQC overview

c. On the "Parameters" tab, define an Input

  • Input Parameter: reads (fastq, set)

  • Input Name: val(name), file(reads)

Image showing FastQC parameters

d. On the "Parameters" tab, define an Output

  • Output Parameter: outputFileHTML (html, file)

  • Output Name: "*.html"

Image showing FastQC parameters

e. On the "Scripts" tab, attach the following in the Script section:

fastqc $reads

Image showing FastQC scripts

f. Click Create Process in the bottom of the pop-up

FastQC process summary

Name: "FastQC"

Menu Group: "Developer Tutorial"

Inputs: 
  reads (fastq, set) name: val(name),file(reads)

Outputs: 
  outputFileHTML (html, file) name: "*.html"

Script:
  fastqc ${reads}

2. Hisat2 process

a. Click the New process button Image showing create process icon in the left menu to open the "Create New Process" window.

b. On the "Overview tab, enter Hisat2 for the process name and define "Menu Group" (eg. Developer Tutorial).

Image showing Hisat overview

c. On the "Parameters" tab, define two Inputs

  • Input 1

    • Parameter: reads (fastq, set)

    • Name: val(name), file(reads)

  • Input 2

    • Parameter: hisat2Index (index, file)

    • Name: hisat2Index

Image showing Hisat parameters

d. On the "Parameters" tab, define two Outputs

  • Output 1

    • Parameter: mapped_reads (bam, set)

    • Name: val(name), file("${name}.bam")

  • Output 2

    • Parameter: outputFileTxt (txt, file)

    • Name: "${name}.align_summary.txt"

Image showing Hisat parameters

e. On the "Scripts" tab, attach the following in the Script section:

basename=\$(basename ${hisat2Index}/*.8.ht2 | cut -d. -f1)
hisat2 -x ${hisat2Index}/\${basename} -U ${reads} -S ${name}.sam &> ${name}.align_summary.txt
samtools view -bS ${name}.sam > ${name}.bam 

Image showing Hisat scripts

f. Click Create Process in the bottom of the pop-up

Hisat2 process summary

Name: "Hisat2"

Menu Group: "Developer Tutorial"

Inputs: 
  reads (fastq, set) name: val(name),file(reads)
  hisat2Index (index, file) name: hisat2Index

Outputs: 
  mapped_reads (bam, set) name: val(name), file("${name}.bam")
  outputFileTxt (txt, file) name: "${name}.align_summary.txt"

Script:
  basename=\$(basename ${hisat2Index}/*.8.ht2 | cut -d. -f1)
  hisat2 -x ${hisat2Index}/\${basename} -U ${reads} -S ${name}.sam &> ${name}.align_summary.txt
  samtools view -bS ${name}.sam > ${name}.bam 

3. RSeQC process

a. Click the New process button Image showing create process icon in the left menu to open the "Create New Process" window.

b. On the "Overview tab, enter RSeQC for the process name and define "Menu Group" (eg. Developer Tutorial).

Image showing RSeQC overview

c. On the "Parameters" tab, define two Inputs

  • Input 1

    • Parameter: mapped_reads (bam, set)

    • Name: val(name), file(bam)

  • Input 2

    • Parameter: bedFile (bed, file)

    • Name: bed

Image showing RSeQC input parameters

d. On the "Parameters" tab, define an Output

  • Parameter: outputFileTxt (txt, file)

  • Name: "RSeQC.${name}.txt"

Image showing RSeQC output parameters

e. On the "Scripts" tab, attach the following in the Script section:

read_distribution.py  -i ${bam} -r ${bed}> RSeQC.${name}.txt

Image showing RSeQC scripts

f. Click Create Process in the bottom of the pop-up

RSeQC process summary

Name: "RSeQC"

Menu Group: "Developer Tutorial"

Inputs:
  mapped_reads (bam, set) name: val(name), file(bam)
  bedFile (bed, file) name: bed

Outputs: 
  outputFileTxt (txt, file) name: "RSeQC.${name}.txt"

Script:
  read_distribution.py  -i ${bam} -r ${bed}> RSeQC.${name}.txt

Part 3: Building Pipeline

Navigate to the pipeline page.

Image showing pipeline page

a. At the top of the page, you’ll notice Pipeline Name box. You can rename your pipeline by clicking on the pencil.

b. Drag and drop three Input parameters (yellow circles) onto the workflow.

Image showing pipeline page

c. To change the name of an input, click on the grey part of the circle and then click the pencil icon.

Image showing pipeline page

Change their names to:

  • Input_Reads

  • Hisat2_Index

  • bedFile

d. Connect the inputs to their processes by clicking on the yellow circle and dragging to the red circle.

Image showing pipeline page

e. Connect your Hisat2 process with RSeQC process using mapped_reads parameter in both. You will observe that, when the types match you can connect the two processes using their matching input and output parameters.

Image showing pipeline page

f. Drag & Drop three output parameters from the sidebar. Name them:

  • FastQC_output

  • Hisat2_summary

  • RSeQC_output

While naming, choose the right output format according to the output type of the process ("HTML"" for FastQC_ouput, "Text" for the others). Note: published outputs will show up on the report page when a run is complete. Unpublished outputs will still be available in the report folder, they just won't be accessible for visuialization on the report page. Typically large files should not be published.

Image showing pipeline page

g. Connect them to their corresponding processes. Overall pipeline should look like below.

Image showing pipeline page