In order to navigate and access data available on the Gen3 platform, please start by visiting the login page. You will need an eRA Commons account as well as access permissions through the Database of Genotypes and Phenotypes (dbGaP). If you are a researcher, login using your eRA Commons account. BioData Catalyst consortia developers can login using their Google accounts. Please make sure to use the correct login method that contains access to your available projects.
Once logged in, your username will appear in the upper right-hand corner of the page. You will also see a display with aggregate statistics for the total number of subjects, studies, aliquots and files available within the BioData Catalyst platform.
NOTE: These numbers may differ from those displayed in the dbGaP records as they include TOPMed studies as well as the associated parent studies.
A number of clinical variables have been harmonized by the Data Coordinating Center (DCC) in order to facilitate cross-study analysis. Faceted search over the DCC Harmonized Variables is available via the Exploration page, under the "Data" tab.
Unharmonized clinical files are also available on the Gen3 platform and contain all of the raw phenotypic information for the hosted studies. Unlike the DCC Harmonized Variables, these files are located and searchable under the "Files" tab in the Exploration page.
The Gen3 platform hosts genomic data provided by the Trans-Omics for Precision Medicine (TOPMed) program and the 1000 Genomes Project plus synthetic tutorial data from Terra. At present, these projects include CRAM and VCF files together with their respective index files. Specifically for TOPMed projects, each project will contain at least one multi-sample VCF that comprises all subjects within the consent group. CRAM and VCF are based on an individual level, whereas multi-sample VCFs are based on the study consent level.
The BioData Catalyst Gen3 platform contains five pages described below:
Dictionary: An interactive data dictionary display that details the contents and relationships between clinical and biospecimen data
Exploration: The facet filter custom cohort creation tool
Query: The GraphQL query tool to retrieve specific data within the graph model
Workspace: The launch page for Gen3 workspaces that includes Jupyter Notebooks and RStudio
Profile: The information page for each user, displaying access and the location for credential file downloads