LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Introduction
  • Initial Considerations
  • Platform Accounts
  • Billing
  • Moving Files From Terra to Seven Bridges
  • Moving Data From Seven Bridges to Terra

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Analyze Data

Transferring Files Between Seven Bridges and Terra

Instructions on transferring files between NHLBI BioData Catalyst Powered by Seven Bridges and NHLBI BioData Catalyst Powered by Terra

PreviousAnalyze DataNextSeven Bridges

Last updated 3 years ago

Was this helpful?

Introduction

This tutorial guides users through the process of transferring files between the two workspace environments of NHLBI BioData Catalyst: NHLBI BioData Catalyst Powered by Seven Bridges and NHLBI BioData Catalyst Powered by Terra.

Most researchers select one of the workspaces as their primary analysis environment and their labmates and collaborators typically work with them on the same workspace environment. However, there are cases where some collaborators work on Seven Bridges and others work on Terra. In this case, researchers need to share data files between the two workspaces to facilitate collaboration. When researchers run analyses on Seven Bridges, the results, or derived data, is only available on Seven Bridges. Likewise, when researchers run analyses on Terra, the results are only available on Terra. This tutorial provides step-by-step guidance on how to share derived data between the workspace environments. These instructions can also be used to share private data that has been uploaded to Seven Bridges or Terra.

Both open access data and controlled access data can be shared across workspace environments. Importantly, if a researcher intends to share controlled access data, they must ensure that all recipients have the necessary dbGaP permissions for those files. In some cases, this may mean the researchers must be listed as collaborators on their respective dbGaP applications. These instructions are intended for sharing files under 1 terabyte (TB) in size. If you want to share data larger than 1 TB, contact the to discuss your use case.

It is not recommended to transfer large amounts of data between cloud providers or regions; for example, AWS --> Google costs approximately $100/TB.

Initial Considerations

Platform Accounts

The first consideration is platform accounts. Moving data between Seven Bridges and Terra is currently a manual process and requires that one of the researchers involved in sharing has an account on both platforms. It is recommended that the recipient of the shared data is the person to have accounts on both Seven Bridges and Terra.

Let’s consider an example case: Sebastian who is working on Seven Bridges and Teresa who is working on Terra. If Sebastian wants to share data with Teresa so that she can use the data on Terra, Teresa first needs to set up an account on Seven Bridges. Now Teresa has an account on Terra and an account on Seven Bridges. Sebastian will share the data with Teresa on Seven Bridges by adding her as a member of the project with the data he wants to share, with Copy permissions. For information on permissions, refer to the Seven Bridges documentation. Once Teresa is added as a member of the project, she can move the data from the Seven Bridges project to a workspace on the Terra platform, following the instructions in the section titled Moving Data From Seven Bridges to Terra.

If Teresa (Terra) wants to share data with Sebastian (Seven Bridges) so that he can use the data on Seven Bridges, Sebastian first needs to create an account on Terra. Now Sebastian has an account on Seven Bridges and an account on Terra. Teresa can share the data with Sebastian on Terra by sharing the workspace with the data she wants to share with Sebastian. For information on sharing workspaces, refer to the Terra documentation.

To create a Terra account, refer to the .

To create a Seven Bridges account, refer to the . If you are new to Seven Bridges, you may find this helpful.

Billing

The second consideration is making sure the researcher moving data between the two workspaces has billing groups set up on both workspaces to cover cloud costs if necessary. Contact the if you have questions about how to get a billing group on Seven Bridges or Terra.

Moving Files From Terra to Seven Bridges

The following steps describe how to use the Seven Bridges platform to pull data securely from a Terra workspace into a Seven Bridges project.

  • Works well for all size transfers.

  • Ideal for large file sizes or 1000s of files.

  • Can be used for transfers between local storage and a bucket, workspace VM or persistent disk and a Google bucket, as well as between Google buckets (external and workspace).

After launching the notebook, the next step is to open the terminal and install the program gsutil which is a python program that lets end users add data to or copy data from a Google Cloud bucket. After opening the terminal, run the following commands:

pip install gsutil
gsutil config

Installing gsutil takes only a few seconds.

The config command provides a secure URL for you to navigate to in the browser. You will authenticate with the same credentials that were used to login to Terra. The shortcut to access the printed URL in the JupyterLab terminal is to press shift and right click, which will display options to copy the URL. Copy and then navigate to the URL in a new browser tab, which will direct you to Google authentication:

Google will provide an authentication code that you will copy and paste into the terminal.

Next, you will type in the Google Project id. This is found on the right side of the Terra Workspace Dashboard.

Next, run the command below to display the different Google buckets that are attached to the project id.

gsutil ls

The Google bucket name for the Terra project can be found in the lower right corner of the Terra Workspace.

Running gsutil ls on the Google bucket name will display the folders and files from the Terra workspace.

To copy a folder to the Seven Bridges workspace environment, run the following command:

gsutil cp -R gs://[Google-Bucket-Name] /sbgenomics/output-files/

There are a couple important things to mention about the gsutil cp command. First, the -R flag for gsutil cp is used to recursively copy a folder and all of its subfolders and files. Most users will likely want to use the -R flag. This flag should be omitted if copying individual files or if using a wild card such as “*.vcf”.

Moving Data From Seven Bridges to Terra

In this section we will discuss pushing data from a Seven Bridges project to a Terra workspace.

The process of moving data from Seven Bridges to Terra is the same setup as the previous section with some modifications to the gsutil copy command. Instead, we reverse the arguments.

gsutil cp -R /sbgenomics/output-files/vcfs_to_transfer \ gs://[Google-Bucket-Name]

You will still use the -R flag but the destination is a Terra bucket. The Terra workspace’s Google bucket name/id can be found on the Terra workspace Dashboard tab. You can verify that the folder has been copied by navigating to the Files section of the Data tab in your Terra workspace.

Clicking on the folder, you will see that all three files have been copied.

Refer to the , specifically the section . This method:

You will use the terminal in JupyterLab on the Seven Bridges workspace environment. The reason for this is that although Seven Bridges can run on the Google Cloud Platform, the Google bucket API is not exposed in the same manner as it is on Terra. Therefore you will start a JupyterLab notebook on Seven Bridges, using the project you would like to be the destination for the copied data. Refer to the Seven Bridges documentation for launching and accessing the terminal in a .

Additionally, /sbgenomics/output-files should be the destination folder when bringing in data from Terra, as this will ensure the files or folders get populated back to the Seven Bridges project. Refer to the documentation for information about working with files in Data Cruncher environments. After the JupyterLab instance is shut down, your files will automatically be populated in your project-files tab on Seven Bridges.

BioData Catalyst Help Desk
Set permissions
How to share a workspace
Terra documentation
Seven Bridges documentation
Getting Started Guide
BioData Catalyst Help Desk
Terra documentation for Moving data to/from a Google bucket (workspace or external)
Upload and download data files in a terminal using gsutil
Jupyter Lab notebooks on Seven Bridges
JupyterLab environment
Save analysis outputs