Discovering Data Using Gen3

How to login to the NHLBI BioData Catalyst Gen3 platform and view available genomic and phenotypic data.

Login to the BioData Catalyst Gen3 Platform

In order to navigate and access data available on the Gen3 platform, please start by visiting the login page. You will need an eRA Commons account as well as access permissions through the Database of Genotypes and Phenotypes (dbGaP). If you are a researcher, login using your eRA Commons account. BioData Catalyst consortia developers can login using their Google accounts. Please make sure to use the correct login method that contains access to your available projects.

Login page for the BioData Catalyst Gen3 portal.

Once logged in, your username will appear in the upper right-hand corner of the page. You will also see a display with aggregate statistics for the total number of subjects, studies, aliquots and files available within the BioData Catalyst platform.

NOTE: These numbers may differ from those displayed in the dbGaP records as they include TOPMed studies as well as the associated parent studies.

Post-login view of the BioData Catalyst Gen3 front page.

Types of Hosted Data

Phenotypic

DCC Harmonized clinical data:

A number of clinical variables have been harmonized by the Data Coordinating Center (DCC) in order to facilitate cross-study analysis. Faceted search over the DCC Harmonized Variables is available via the Exploration page, under the "Data" tab.

Unharmonized clinical data:

Unharmonized clinical files are also available on the Gen3 platform and contain all of the raw phenotypic information for the hosted studies. Unlike the DCC Harmonized Variables, these files are located and searchable under the "Files" tab in the Exploration page.

Genomic

The Gen3 platform hosts genomic data provided by the Trans-Omics for Precision Medicine (TOPMed) program and the 1000 Genomes Project plus synthetic tutorial data from Terra. At present, these projects include CRAM and VCF files together with their respective index files. Specifically for TOPMed projects, each project will contain at least one multi-sample VCF that comprises all subjects within the consent group. CRAM and VCF are based on an individual level, whereas multi-sample VCFs are based on the study consent level.

All files are available under the "Files" tab in the Exploration page. More detailed information on currently hosted data on the Gen3 platform can be found here.

Gen3 Pages

The BioData Catalyst Gen3 platform contains five pages described below:

  • Dictionary: An interactive data dictionary display that details the contents and relationships between clinical and biospecimen data

  • Exploration: The facet filter custom cohort creation tool

  • Query: The GraphQL query tool to retrieve specific data within the graph model

  • Workspace: The launch page for Gen3 workspaces that includes Jupyter Notebooks and RStudio

  • Profile: The information page for each user, displaying access and the location for credential file downloads

The BioData Catalyst Gen3 Pages.