The 2020-04-02 release marks the first significant release for the NHLBI BioData Catalyst ecosystem. This release offers an integrated system of platforms and services for researchers to search metadata of hosted datasets, find data files, and analyze data files in workspace environments which support a variety of different analysis modalities.
The hosted data for this release includes TOPMed multi-sample VCF data for ~55,000 sequenced participants within 32 TOPMed studies included in Freeze 5b as well as CRAM files for those participants. In addition, this release includes raw phenotype files for participants in TOPMed studies, providing clinical information such as BMI and lipids levels. In some cases, these data are in different dbGaP accessions than the genomic data. The hosted data is stored in both Amazon Web Services and Google Cloud and users have the option to run computation on either cloud provider. To access the hosted TOPMed data on BioData Catalyst, users must have dbGaP approval. Please refer to the Data page on the BioData Catalyst website for more information.
For more in depth information please see the "List of significant new features" below.
The following features in this release support primarily TOPMed researchers ranging in technical skills (both command-line and GUI) and with approval for the controlled TOPMed studies in dbGaP:
System login and data access: Researchers can log into the BioData Catalyst platforms using their eRA Commons ID. Approvals for TOPMed studies in dbGaP are recognized by the platforms.
Search TOPMed phenotypic data: Create cohorts on PIC-SURE by searching and selecting phenotypic variables of interest from dbGaP and then export cohorts to Seven Bridges or Terra for use in analysis workspaces. Users can also explore the TOPMed phenotype variables harmonized by the TOPMed Data Coordinating Center.
Find and access TOPMed genomics files, raw phenotype data files, and reference data files: Use the Explorer feature on Gen3 and the Data Browser feature on Seven Bridges.
Bring your own data: Use one of several options to upload/import data files to the workspace environments.
Run analyses at Scale: Analyze thousands of samples at once using batch processing capabilities in secure workspaces. Ability to run computation on Google Cloud and Amazon Web Services. Utilize visual user interface, Jupyterlab Notebooks and Jupyter Notebooks, RStudio, API, and command line.
Association studies: Execute single variant and multiple variant association studies utilizing the GENESIS pipelines, Hail, and others. Utilize Annotation Explorer to create variant grouping files for multiple variant association studies.
Collaborate with other users: Share workspaces, files, and tools with other BioData Catalyst users.
Documentation: Access documentation for each of the platforms.
Track cloud costs: Track cloud storage and compute costs on Seven Bridges and Terra.
Hosted TOPMed study accessions with genomic data from Freeze 5b
phs I.D. #
NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish
NHLBI TOPMed: Atherosclerosis Risk in Communities
NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados
NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation Study
NHLBI TOPMed: The Cleveland Family Study
NHLBI TOPMed: Cardiovascular Health Study
NHLBI TOPMed: Genetic Epidemiology of COPD
NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica
NHLBI TOPMed: Diabetes Heart Study
NHLBI TOPMed: Boston Early-Onset COPD Study
NHLBI TOPMed: Framingham Heart Study
NHLBI TOPMed: Genes-Environments and Admixture in Latino Asthmatics
NHLBI TOPMed: Genetic Study of Atherosclerosis Risk
NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy
NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity
NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate
NHLBI TOPMed: Heart and Vascular Health Study
NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy
NHLBI TOPMed: The Jackson Heart Study
NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism
NHLBI TOPMed: The Multi-Ethnic Study of Atherosclerosis
NHLBI TOPMed: Massachusetts General Hospital (MGH) Atrial Fibrillation Study
NHLBI TOPMed: Partners HealthCare Biobank
NHLBI TOPMed: San Antonio Family Heart Study
NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment
NHLBI TOPMed: African American Sarcoidosis Genetics Resource
NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans
NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry
NHLBI TOPMed: The Women's Genome Health Study
NHLBI TOPMed: Women's Health Initiative
Hosted TOPMed study accessions with phenotype data
phs I.D. #
Atherosclerosis Risk in Communities
Cleveland Clinic Atrial Fibrillation Study
The Cleveland Family Study
Cardiovascular Health Study
Genetic Epidemiology of COPD
Framingham Heart Study
Genes-Environments and Admixture in Latino Asthmatics
Genetic Study of Atherosclerosis Risk
Genetic Epidemiology Network of Arteriopathy
Genetic Epidemiology Network of Salt Sensitivity
Heart and Vascular Health Study
The Jackson Heart Study
The Multi-Ethnic Study of Atherosclerosis
Massachusetts General Hospital (MGH) Atrial Fibrillation Study
Women's Health Initiative
Information on the status of data releases is forthcoming.
Gen3 release notes
PIC-SURE release notes