Using GCP Tools (gsutil / Cyberduck)
Tip
When uploading data from your local machine to the cloud, it's crucial to organize your data effectively. This will make it much easier to locate your files when you need them.
Once your organization is registered in Foundry, a default Google Cloud Storage (GCS) bucket is created for you. You can access this bucket directly within Via Foundry or use external tools like gsutil or Cyberduck for uploading your data.
Prerequisites
Contact Support
Before you begin, contact Via Scientific support at support@viascientific.com to obtain:
- Organization Default Bucket Name (e.g.,
gs://your-org-bucket) - GCP Project ID associated with your bucket access.
- Confirmation that Google OAuth access is enabled for your organization.
Option 1: Upload with gsutil (CLI)
gsutil is a Python application that lets you access Google Cloud Storage from the command line. It is recommended for large uploads.
1. Install Google Cloud CLI
Install the Google Cloud CLI (which includes gsutil) following the instructions here.
2. Authenticate via Google OAuth
Run the following command to initialize the SDK and authenticate:
gcloud init
Or, if you already have gcloud configured:
gcloud auth login
3. Step-by-Step Upload
-
Verify Access: List the contents of your bucket to ensure you have access:
gsutil ls gs://YOUR_ORG_BUCKET/ -
Upload Data:
- Single File:
gsutil cp /path/to/local/file.fastq.gz gs://YOUR_ORG_BUCKET/my-dataset/ - Folder (Recursive):
Use the
-mflag for parallel uploads (faster for many files):gsutil -m cp -r /path/to/local/folder gs://YOUR_ORG_BUCKET/my-dataset/
- Single File:
-
Verify Upload:
gsutil ls gs://YOUR_ORG_BUCKET/my-dataset/
Option 2: Upload with Cyberduck (GUI)
If you prefer a graphical user interface, you can use Cyberduck to upload your data. Please refer to our Cyberduck Guide for detailed instructions on how to set up and use Cyberduck with your cloud storage.
Using Uploaded Data in Via Foundry
Once your data is uploaded, you can connect it as a Data Source:
- Go to Data -> Create Dataset.
- Select Google Cloud Storage.
- In Choose Credentials, select "Account Default".
- In Data Source Path, enter the path to your data (e.g.,
gs://YOUR_ORG_BUCKET/my-dataset/). - Click Connect.