LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Welcome to NHLBI BioData Catalyst® (BDC)
  • About BDC and Our Community
  • The BDC Ecosystem and Services
  • Ecosystem Access, Hosted Data, and System Services
  • BioData Catalyst Publications
  • Questions?

Was this helpful?

Export as PDF

NHLBI BioData Catalyst® (BDC) Documentation

This is a repository for documentation related to the platforms and services that are part of the BDC ecosystem.

NextWho We Are

Last updated 1 month ago

Was this helpful?

Click here to access the website.

Welcome to NHLBI BioData Catalyst® (BDC)

Welcome to the BDC ecosystem and thank you for joining our community of practice. The ecosystem offers secure workspaces to support your data analysis in addition to a number of bioinformatics tools for analysis. The ecosystem currently hosts datasets from the Transomics for Precision Medicine (TOPMed) program. There is a lot of information to understand and many resources (documentation, learning guides, videos, etc.) available, so we developed this overview to help you get started. If you have additional questions, please use the links at the very end of this document, under the "Questions" section, to contact us.

About BDC and Our Community

What is BDC?

NHLBI BioData Catalyst® (BDC) is a cloud-based ecosystem that offers researchers data, analytical tools, applications, and workflows in secure workspaces. BDC is a community where researchers can find, access, share, store, and analyze heart, lung, blood, and sleep data. BDC is an NHLBI data repository where researchers share scientific data from NHLBI-funded research, so they and others can reproduce findings and reuse data to advance science.

By increasing access to NHLBI data and innovative analytic capabilities, BDC accelerates reproducible biomedical research to drive scientific advances that can help prevent, diagnose, and treat heart, lung, blood, and sleep disorders.

What are we doing and why does it matter?

By increasing access to the NHLBI’s datasets and innovative data analysis capabilities, the BDC ecosystem accelerates efficient biomedical research that drives discovery and scientific advancement, leading to novel diagnostic tools, therapeutics, and prevention strategies for heart, lung, blood, and sleep disorders.

Who is developing BDC?

The ecosystem is funded by the National Heart, Lung, and Blood Institute (NHLBI). Researchers and other professionals receive funding from the NHLBI to work on the development of the ecosystem, together often referred to as “The BDC Consortium” or “The Consortium” for short. You can refer to on the Overview page of the BDC website and a is available in our documentation.

Find out the meanings of our terms and acronyms.

Like many professional communities, BDC has adopted terms to help us communicate quickly and more efficiently, but that can be a challenge for newcomers. To help, we created a BDC of terms and acronyms. If ever there is a time when an ecosystem term or acronym is unfamiliar and isn’t in the glossary, so we can give you the information and add it to the glossary.

The BDC Ecosystem and Services

Learn about the platforms and services available in the ecosystem.

The BDC ecosystem features the following platforms and services.

Explore Available Data

  • BioData Catalyst Powered by Gen3 - Hosts genomic and phenotypic data and enables faceted search for authorized users to create and export cohorts to workspaces in a scalable, reproducible, and secure manner.

  • BioData Catalyst Powered by PIC-SURE - Enables access to all clinical data, feasibility queries to be conducted, and allows cohorts to be built in real-time and results to be exported via the API for analysis.

Analyze Data in Cloud-based Shared Workspaces

  • BioData Catalyst Powered by Seven Bridges - Collaborative workspaces where researchers can find and analyze hosted datasets (e.g. TOPMed) as well as their own data by using hundreds of optimized analysis tools and workflows in CWL, as well as JupyterLab and RStudio for interactive analysis.

  • BioData Catalyst Powered by Terra - Secure collaborative place to organize data, run and monitor workflow (e.g. WDL) analysis pipelines, and perform interactive analysis using applications such as Jupyter Notebooks and the Hail GWAS tool.

Use Community Tools on Controlled-access Datasets

  • Dockstore - Catalog of Docker-based workflows (from individuals, labs, organizations) that export to Terra or Seven Bridges.

Ecosystem Access, Hosted Data, and System Services

How does data access work?

How do I login?

While all of the platforms within BioData Catalyst use eRA Commons credentials and iTrust performs authorization and authentication, respectively, there are some slight differences between the platforms when getting set up:

  • BioData Catalyst Powered by Gen3 - Users do not set up usernames on Gen3. Upon the first time logging in, select “Login from NIH”, then enter eRA commons credentials at the prompt. This ‘User Identity’ is used to track the user on the system.

  • BioData Catalyst Powered by PIC-SURE - Similar to Gen3, user identities are used - researchers log into the system by selecting “Log in with eRA Commons.”

  • BioData Catalyst Powered by Seven Bridges - Users set up platform accounts. The first time on the system, users select to “Create an account” and then proceed with entering their eRA Commons credentials. The user is then prompted to fill out a registration form with their name, email, and preferred username. Users are also asked to acknowledge that they have read the Privacy Act notice and then they can proceed to the platform.

  • BioData Catalyst Powered by Terra - Users initially log in using Google credentials and are asked to agree to the Terms of Service and Privacy Act notice. User activity is tracked via the Google credentials, but users can link their eRA Commons credentials to the account to get access to hosted datasets.

How do I check which data I can access?

What data are available in the ecosystem?

Harmonized data available.

Bring your own data and workflows into the system.

Learn about Genome-wide association study and genetic association testing on BioData Catalyst.

Share your workflows.

Costs and cloud credits.

BioData Catalyst Publications

Let us know about your publications and see how you can cite us.

Questions?

Learn more, ask questions, or request help.

The NHLBI BioData Catalyst website provides further details about the available in the ecosystem. We encourage you to create accounts on all the platforms as you get to know BioData Catalyst.

The BioData Catalyst ecosystem manages access to the hosted controlled data using data access approvals from the NIH Database of Genotypes and Phenotypes (). Therefore, users who want to access a hosted controlled study on the ecosystem must be approved for access to that study in dbGaP.

Users log into BioData Catalyst platforms with their eRA Commons credentials (see ) and authentication is performed by iTrust. Every time a user logs in, the ecosystem checks his/her user credentials to ensure s/he can only access the data for which s/he has dbGaP approval.

Details about how data access works on the NHLBI BioData Catalyst ecosystem are .

We recommend users first check their access to data before logging in. Do this by going to the and clicking on the “Check My Access” button. Once you confirm your data access, go to the page from which you click on the “Launch” hyperlink for the platform or service you wish to use. Platforms and services have login/sign in links on their pages that bring you to the pages on which you enter your eRA Commons credentials. on checking your access to data is also available.

The NHLBI BioData Catalyst currently hosts a subset of datasets from TOPMed including phs numbers with genomic data and related phs numbers with phenotype data. You can find information about which are currently hosted on the of the website as well as in the .

There are limited amounts of harmonized data available to users with appropriate access at this time. The TOPMed Data Coordinating Center curation team has produced forty-four (44) harmonized phenotype variables from seventeen (17) NHLBI studies. Information about the 17 studies and the 44 variables can be found in the .

We allow researchers to bring their own data and workflows into the ecosystem to support their analysis needs. Researchers can bring their own datasets into and . Users can also bring their own workflows to the system. Users can either add workflows to in CWL or WDL, or they can directly on BioData Catalyst Powered by Seven Bridges and for use on BioData Catalyst Powered by Terra.

Walk through our self-paced genome-wide association study and genetic association testing .

We encourage users to publish their workflows so they can be used by other researchers working in the NHLBI BioData Catalyst ecosystem. Share your workflows via .

BioData Catalyst hosts a number of datasets available for analysis to users with appropriate data access approvals. Users are not charged for the storage of these hosted datasets; however, if hosted data is used in analyses users incur costs for computation and storage of derived results. Cloud credits are available on the system, and you can .

If you are writing a manuscript about research you conducted using NHLBI BioData Catalyst, please use .

Immediately after learning your manuscript has been accepted, please email to let us know. Please include in your email the manuscript title, the name of the publication that accepted your manuscript, and information about pre-publication posting (if it will take place), along with your name and contact information.

Answers to are available on the website, as are many resources that can be found under . You can also use a form to , and if you aren’t sure which selections to make on the form, please see our .

NHLBI BioData Catalyst® (BDC)
a list of partners and platforms powering the ecosystem
list of the principal investigators
glossary
contact us
platforms and services
dbGaP
Understanding eRA Commons Accounts
on the website
Accessing BioData Catalyst page
Platforms and Services
Documentation
TOPMed studies
Data page
Release Notes
BioData Catalyst Powered by PIC-SURE User Guide
BioData Catalyst Powered by Seven Bridges
BioData Catalyst Powered by Terra
Dockstore
create CWL tools
develop custom workflows
tutorials
Dockstore
learn more here
the citation available here
BDCatalystOutreach@nih.gov
frequently asked questions
Learn & Support
Contact Us
help desk directory