Creating Datasets
Datasets serve as the starting point for managing and processing your data within the platform.
Add Data to Datasets
Navigate to the Data tab on the left menu.
-
Create a Dataset: Click Create Dataset, enter a name, and create the dataset. Foundry will take you to the new dataset page.
-
Add a File: Click Add, then select Add New File.
-
Select a Data Source: Select the data source where your run data are located. If your data is located on your local desktop, you need to transfer those files to the target location. Refer to Upload Local Data section for details. Depending on your selection, you may need to provide credentials (e.g., S3, GCS, BaseSpace).

- S3: If your Foundry already has built-in access to S3 buckets, select Account Default. To access additional buckets that Foundry cannot reach by default, create credentials under Profile → Credentials.
- GCS: If your Foundry already has built-in access to Google Cloud Storage (GCS), select Account Default. To access additional buckets that Foundry cannot reach by default, create credentials under Profile → Credentials.
- SSH: If your data are located on an HPC/cluster, you can access them via an SSH connection. You must have a run environment configured for this connection (Profile → Run Environments).
- BaseSpace: If your data are in BaseSpace, add your BaseSpace credentials under Profile → Credentials. See the BaseSpace Guide for details.
- GEO/NCBI: No credentials are required. You can enter one or more GEO IDs, and Foundry will retrieve all associated files and sample names.
- URL: No credentials are required. You can reference any public resource using a URL.
-
Manage Samples: After you connect to your data source, Foundry lists available files and folders for selection. You can either:
- Select individual files and click Add to cart, or
- Select folders and click Add to cart
At each step, Foundry provides built-in filtering to help you include files that match specific name patterns.

-
Dataset Type:
After choosing your files, you can organize them so that each defined sample corresponds to the files that will run together in pipelines. Available dataset types are: Single list, Paired list, and Triple list.- Single list: Use this for single-end reads or any file type that runs independently (e.g., BAM, TSV, CSV, TXT, H5, etc.).

- Paired list: Use this for paired-end reads. Set the Dataset type to Paired list, enter the R1 and R2 patterns (e.g.,
.R1and.R2), and then click Add all files.
- Triple list: Use this when the pipeline requires three related files per sample (for example, paired reads plus an additional index file). Enable Triple list and provide three patterns.

- Single list: Use this for single-end reads or any file type that runs independently (e.g., BAM, TSV, CSV, TXT, H5, etc.).
-
Merging Files (Optional):
If you need to merge files so that, for example, all technical replicates run together as a single sample, select the files to be merged and click the Merge selected files button.
-
Auto-merging: If you want to avoid manually selecting files and clicking Merge selected files, expand the Auto-merging Pattern section. Enter a glob pattern containing a
{name}placeholder. Files matching the same{name}will be merged automatically. Click Merge Files Using Pattern to apply the auto-merge.Examples Patterns:
- `{name}_L*_R?_001.fastq.gz` - `*-*-{name}_L*_R?_001.fastq.gz` - `*-{name}-{name}_L*_R?_001.fastq.gz`For example, if you want to merge all replicates shown in the image below, you can use the pattern
{name}_rep?*.{1,2}.fastq.gz. Foundry will merge allcontrolreplicates and createcontrol.R1.fastq.gzandcontrol.R2.fastq.gz, and similarly merge allexperreplicates to createexper.R1.fastq.gzandexper.R2.fastq.gz.
-
-
Backup to Another Location (optional) If you need to save final merged/renamed file to another location, you can use this section. You can save your data to S3 or GCP cloud by providing paths.
-
Change Sample Tracker (Optional): By default, Foundry saves your files to the
defaultSample Tracker. If you want to organize these files under a different tracker (for example, to use custom metadata fields), use the dropdown to create or select a Sample Tracker and add the fields you need. Otherwise, keepdefaultselected. For details, see Sample Tracker Guide.
![]()
- Save Files: Click Save Files to save the selected files (and their metadata) to the selected Sample Tracker, and link them to your current dataset.
Upload Local Data
There are multiple approaches to transfer your local files to the target location. Please choose the tool based on your target run environment.
-
GCP Environment: Before you begin, contact Via Scientific support at
support@viascientific.comto learn the GCS bucket to which you can upload data.- gcloud / gsutil (Recommended command-line tool)
- Cyberduck
-
AWS Environment: Before you begin, contact Via Scientific support at
support@viascientific.comto learn the AWS bucket to which you can upload data.- AWS CLI (Recommended command-line tool) AWS CLI guide
- Cyberduck (Cyberduck guide)
-
HPC/Cluster Environment: You can use various tools to transfer your files from your local machine to the HCP environment.
- For reproducible, scriptable transfers, we recommend
scporrsync. - For users who prefer a graphical interface, we suggest:
- Cyberduck (macOS/Windows)
- FileZilla (macOS/Windows)
- WinSCP (Windows only)
- For reproducible, scriptable transfers, we recommend
Uploading Data to GCP-Cloud (Foundry)
In the GCP-Cloud environment, you can upload data to Foundry in several ways, depending on your needs and your organization’s setup.
1. Upload via Foundry UI (Datasets)
Users can upload files directly through the Datasets section in Foundry.
Use this option if:
- You prefer a graphical interface.
- You don’t need automation or scripting.
2. Access via GCP Tools (gsutil / Cyberduck)
If you prefer using command-line or external GUI tools, Foundry supports access via gsutil (CLI) and Cyberduck (GUI). The exact setup may vary based on:
- Your role in the organization.
- Your company’s preferences and security policies.
Prerequisites
To use gsutil or Cyberduck with Foundry, you must:
-
Have a Google Cloud Platform (GCP) account in the same GCP environment.
-
You can create a GCP account here: https://cloud.google.com/free
- After your GCP account is created, you can use one of the following options to connect to Foundry storage.
A. Organization Default Bucket (Managed by Foundry)
Once your organization is registered in Foundry:
- Foundry creates a default GCS bucket for your organization.
- Users are responsible for all charges incurred through bucket usage and access (e.g., storage, operations, egress).
-
Upon request from the Via Support team, we can:
-
Provision access for you to this bucket.
- Enable login via Google OAuth so you can connect using gsutil or Cyberduck.
B. Project Buckets (Self-Managed by Selected Users)
Based on a request from your organization admins, we can enable the Project Bucket feature for selected users.
When this feature is enabled:
- These users can create their own GCS buckets.
- They can grant access to other GCP users directly.
- Each project in Foundry can have its own dedicated project bucket, created and managed by the project owner.
- Users are responsible for all charges incurred through bucket usage and access (e.g., storage, operations, egress).
Use this option if:
- You need per-project isolation.
- Your team prefers to self-manage GCP buckets and permissions.
C. Use an Existing GCP Bucket (Your Own Storage)
If you already have:
- A GCP account, and
- An existing GCS bucket managed by your IT or cloud team,
then:
- Your IT team can grant read access on that bucket to a Foundry IAM role.
- Once access is granted, Foundry can read your data directly from your existing bucket.
Use this option if:
- Your organization already has established GCP storage.
- You want Foundry to consume data from your existing infrastructure instead of moving data.
Uploading Data to AWS-Cloud (Foundry)
In the AWS-Cloud environment, you can upload data to Foundry in several ways, depending on your needs and your organization’s setup.
1. Upload via Foundry UI (Datasets)
Users can upload files directly through the Datasets section in Foundry.
Use this option if:
- You prefer a graphical interface.
- You don’t need automation or scripting.
2. Access via AWS Tools (AWS CLI / Cyberduck)
If you prefer command-line or external GUI tools, Foundry supports access via:
- AWS CLI
- Cyberduck (GUI)
The exact setup may vary based on:
- Your role in the organization.
- Your company’s security policies and AWS setup.
A. Organization Default Bucket (Managed by Foundry)
After your organization is registered in Foundry:
- Foundry creates a default S3 bucket for your organization.
-
Upon request from the Via Support team, we can:
-
Provide access keys for this S3 bucket.
- Allow you to configure AWS CLI or Cyberduck to connect using these credentials.
B. Use an Existing S3 Bucket (Your Own Storage)
If you already have:
- An AWS account, and
- An existing S3 bucket managed by your IT or cloud team,
then:
- Your IT team can grant read access on that bucket to a Foundry IAM role.
- Once access is granted, Foundry can read your data directly from your existing S3 bucket.
Use this option if:
- Your organization already uses S3 for data storage.
- You want Foundry to consume data from your existing AWS infrastructure instead of duplicating data.