# NHLBI BioData Catalyst® (BDC) Documentation

Click here to access the [NHLBI BioData Catalyst<sup>®</sup> (BDC)](https://biodatacatalyst.nhlbi.nih.gov/) website.

## Welcome to NHLBI BioData Catalyst<sup>®</sup> (BDC)

Welcome to the BDC ecosystem and thank you for joining our community of practice. The ecosystem offers secure workspaces to support your data analysis in addition to a number of bioinformatics tools for analysis. There is a lot of information to understand and many resources (documentation, learning guides, videos, etc.) available, so we developed this overview to help you get started. If you have additional questions, use the links at the very end of this document, under the **"Questions"** section, to contact us.

## About BDC and Our Community

**What is BDC?**

NHLBI BioData Catalyst<sup>®</sup> (BDC) is a cloud-based ecosystem that offers researchers data, analytical tools, applications, and workflows in secure workspaces. BDC is a community where researchers can find, access, share, store, and analyze heart, lung, blood, and sleep data. BDC is an NHLBI data repository where researchers share scientific data from NHLBI-funded research, so they and others can reproduce findings and reuse data to advance science.

By increasing access to NHLBI data and innovative analytic capabilities, BDC accelerates reproducible biomedical research to drive scientific advances that can help prevent, diagnose, and treat heart, lung, blood, and sleep disorders.

**What are we doing and why does it matter?**

By increasing access to the NHLBI’s datasets and innovative data analysis capabilities, the BDC ecosystem accelerates efficient biomedical research that drives discovery and scientific advancement, leading to novel diagnostic tools, therapeutics, and prevention strategies for heart, lung, blood, and sleep disorders.

**Who is developing BDC?**

The ecosystem is funded by the National Heart, Lung, and Blood Institute (NHLBI). Researchers and other professionals receive funding from the NHLBI to work on the development of the ecosystem, together often referred to as “The BDC Consortium” . You can refer to [a list of partners and platforms powering the ecosystem](https://biodatacatalyst.nhlbi.nih.gov/about/overview/) on the Overview page of the BDC website and a [list of the principal investigators](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/community/who-we-are) is available in our documentation.

**Find out the meanings of our terms and acronyms.**

Like many professional communities, BDC has adopted terms to help us communicate quickly and more efficiently, but that can be a challenge for newcomers. To help, we created a BDC [glossary](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/biodata-catalyst-glossary/bdc_glossary) of terms and acronyms. If ever there is a time when an ecosystem term or acronym is unfamiliar and isn’t in the glossary, [contact us](https://biodatacatalyst.nhlbi.nih.gov/contact) so we can give you the information and add it to the glossary.

## The BDC Ecosystem&#x20;

**Learn about the platforms available in the ecosystem.**

The BDC ecosystem features the following platforms.

**Explore Available Data**

* *BDC Powered by Gen3* (*BDC-Gen3*) - Hosts genomic and phenotypic data and enables faceted search for authorized users to create and export cohorts to workspaces in a scalable, reproducible, and secure manner.
* *BDC Powered by PIC-SURE* (*BDC-PIC-SURE*) - Enables access to all clinical data, feasibility queries to be conducted, and allows cohorts to be built in real-time and results to be exported via the API for analysis.

**Analyze Data in Cloud-based Shared Workspaces**

* *BDC Powered by Seven Bridges* (*BDC-Seven Bridges*) - Collaborative workspaces where researchers can find and analyze hosted datasets as well as their own data by using hundreds of optimized analysis tools and workflows in CWL, as well as JupyterLab and RStudio for interactive analysis.
* *BDC Powered by Terra* (*BDC-Terra*) - Secure collaborative place to organize data, run and monitor workflow analysis pipelines in WDL, and perform interactive analysis using applications such as Jupyter Notebooks and the Hail GWAS tool.

[Click here to view the differences between BDC’s standard workspaces (*BDC-Seven Bridges*) and those provided by *BDC-Terra*.](https://biodatacatalyst.nhlbi.nih.gov/use-bdc/analyze-data/bdc-workspaces/)

The BDC website provides details about the [platforms and services](https://biodatacatalyst.nhlbi.nih.gov/resources/services) available in the ecosystem. We encourage you to create accounts on all the platforms as you get to know BioData Catalyst.

## Ecosystem Access, Hosted Data, and System Services

**How do I login?**

Users log into BioData Catalyst platforms with their eRA Commons credentials (see [Understanding eRA Commons Accounts](https://era.nih.gov/register-accounts/understanding-era-commons-accounts.htm)) and authentication is performed by iTrust. Every time a user logs in, the ecosystem checks his/her user credentials to ensure s/he can only access the data for which s/he has dbGaP approval.&#x20;

While all of the platforms within BioData Catalyst use eRA Commons credentials and iTrust performs authorization and authentication, respectively, there are some slight differences between the platforms when getting set up:

* *BioData Catalyst Powered by Gen3* - Users do not set up usernames on Gen3. Upon the first time logging in, select “Login from NIH”, then enter eRA commons credentials at the prompt. This ‘User Identity’ is used to track the user on the system.<br>
* *BioData Catalyst Powered by PIC-SURE* - Similar to Gen3, user identities are used - researchers log into the system by selecting “Log in with eRA Commons.”<br>
* *BioData Catalyst Powered by Seven Bridges* - Users set up platform accounts. The first time on the system, users select to “Create an account” and then proceed with entering their eRA Commons credentials. The user is then prompted to fill out a registration form with their name, email, and preferred username. Users are also asked to acknowledge that they have read the Privacy Act notice and then they can proceed to the platform.<br>
* *BioData Catalyst Powered by Terra* - Users initially log in using Google credentials and are asked to agree to the Terms of Service and Privacy Act notice. User activity is tracked via the Google credentials, but users can link their eRA Commons credentials to the account to get access to hosted datasets.

Details about how data access works on the NHLBI BioData Catalyst ecosystem are [on the website](https://biodatacatalyst.nhlbi.nih.gov/resources/data/).

**How do I check which data I can access?**

We recommend users first check their access to data before logging in. Do this by going to the [Accessing BioData Catalyst page](https://biodatacatalyst.nhlbi.nih.gov/resources/data/) and clicking on the **“Check My Access”** button. Once you confirm your data access, go to the [Platforms and Services](https://biodatacatalyst.nhlbi.nih.gov/resources/services) page from which you click on the “**Launch**” hyperlink for the platform or service you wish to use. Platforms and services have login/sign in links on their pages that bring you to the pages on which you enter your eRA Commons credentials. [Documentation](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/data-access/check-my-access-to-data) on checking your access to data is also available.

**What data are available in the ecosystem?**

The NHLBI BioData Catalyst currently hosts a subset of datasets from TOPMed including phs numbers with genomic data and related phs numbers with phenotype data. You can find information about which [TOPMed studies](https://drive.google.com/file/d/1936teBZlvBKbQf1hmdx5JImAxJFbVoIx/view?usp=sharing) are currently hosted on the [Data page](https://biodatacatalyst.nhlbi.nih.gov/resources/data) of the website as well as in the [Release Notes](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/biodata-catalyst-release-notes/2020-04-02-biodata-catalyst-ecosystem-release-notes).

**Harmonized data available.**

There are limited amounts of harmonized data available to users with appropriate access at this time. The TOPMed Data Coordinating Center curation team has produced forty-four (44) harmonized phenotype variables from seventeen (17) NHLBI studies. Information about the 17 studies and the 44 variables can be found in the  [*BioData Catalyst Powered by PIC-SURE User Guide*](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/explore_data/pic-sure-for-biodata-catalyst-user-guide/harmonized-data).

**Bring your own data and workflows into the system.**

We allow researchers to bring their own data and workflows into the ecosystem to support their analysis needs. Researchers can bring their own datasets into [*BioData Catalyst Powered by Seven Bridges*](https://sb-biodatacatalyst.readme.io/docs/upload-to-the-platform) and [*BioData Catalyst Powered by Terra*](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/written-documentation/analyze-data-1/terra/bringing-data-into-a-workspace/using-your-own-data-with-terra). Users can also bring their own workflows to the system. Users can either add workflows to [Dockstore](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/analyze-data/dockstore/contribute-to-the-community) in CWL or WDL, or they can [create CWL tools](https://sb-biodatacatalyst.readme.io/docs/about-the-common-workflow-language) directly on *BioData Catalyst Powered by Seven Bridges* and [develop custom workflows](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/written-documentation/community-tools-and-integration-1/bring-your-own-tool-s-to-biodata-catalyst) for use on *BioData Catalyst Powered by Terra.*

**Learn about Genome-wide association study and genetic association testing on BioData Catalyst.**

Walk through our self-paced genome-wide association study and genetic association testing [tutorials](https://bdcatalyst.gitbook.io/biodata-catalyst-tutorials/).

**Share your workflows.**

We encourage users to publish their workflows so they can be used by other researchers working in the NHLBI BioData Catalyst ecosystem. Share your workflows via [Dockstore](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/analyze-data/dockstore/contribute-to-the-community).

**Costs and cloud credits.**

BioData Catalyst hosts a number of datasets available for analysis to users with appropriate data access approvals. Users are not charged for the storage of these hosted datasets; however, if hosted data is used in analyses users incur costs for computation and storage of derived results. Cloud credits are available on the system, and you can [learn more here](https://biodatacatalyst.nhlbi.nih.gov/resources/cloud-credits).

## Questions?

**Learn more, ask questions, or request help.**

Answers to [frequently asked questions](https://biodatacatalyst.nhlbi.nih.gov/faqs/) are available on the website, as are many resources that can be found under [Learn & Support](https://biodatacatalyst.nhlbi.nih.gov/resources/learn). You can also use a form to [Contact Us](https://biodatacatalyst.nhlbi.nih.gov/contact), and if you aren’t sure which selections to make on the form, please see our [help desk directory](https://bdcatalyst.freshdesk.com/support/solutions/articles/60000666868-where-do-i-direct-my-question-regarding-biodata-catalyst-and-the-help-desk-).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/master.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.