Discovering Data Using Gen3
How to login to the NHLBI BioData Catalyst Gen3 platform and view available genomic and phenotypic data.
Last updated
How to login to the NHLBI BioData Catalyst Gen3 platform and view available genomic and phenotypic data.
Last updated
In order to navigate and access data available on the Gen3 platform, start by visiting the login page. You will need an eRA Commons account as well as access permissions through the Database of Genotypes and Phenotypes (dbGaP). If you are a researcher, login by selecting NIH Login and using your eRA Commons account. BioData Catalyst consortia developers can login using their Google accounts. Make sure to use the correct login method that contains access to your available projects.
Once logged in, your username will appear in the upper right-hand corner of the page. You will also see a display with aggregate statistics for the total number of subjects, studies, aliquots and files available within the BioData Catalyst platform.
NOTE: These numbers may differ from those displayed in the dbGaP records as they include TOPMed studies as well as the associated parent studies.
A number of clinical variables have been harmonized by the Data Coordinating Center (DCC) in order to facilitate cross-study analysis. Faceted search over the DCC Harmonized Variables is available via the Exploration page, under the "Data" tab.
Unharmonized clinical files are also available on the Gen3 platform and contain all of the raw phenotypic information for the hosted studies. Unlike the DCC Harmonized Variables, these files are located and searchable under the "Files" tab in the Exploration page.
The Gen3 platform hosts genomic data provided by the Trans-Omics for Precision Medicine (TOPMed) program and the 1000 Genomes Project plus synthetic tutorial data from Terra. At present, these projects include CRAM and VCF files together with their respective index files. Specifically for TOPMed projects, each project will contain at least one multi-sample VCF that comprises all subjects within the consent group. CRAM and VCF are based on an individual level, whereas multi-sample VCFs are based on the study consent level.
All files are available under the "Files" tab in the Exploration page. More detailed information on currently hosted data on the Gen3 platform can be found here.
The BioData Catalyst Gen3 platform contains five pages described below:
Dictionary: An interactive data dictionary display that details the contents and relationships between clinical and biospecimen data
Exploration: The facet filter custom cohort creation tool
Query: The GraphQL query tool to retrieve specific data within the graph model
Workspace: The launch page for Gen3 workspaces that includes Jupyter Notebooks and RStudio
Profile: The information page for each user, displaying access and the location for credential file downloads