2020-04-02 BioData Catalyst Ecosystem Release Notes

Introduction

The 2020-04-02 release marks the first significant release for the NHLBI BioData Catalyst ecosystem. This release offers an integrated system of platforms and services for researchers to search metadata of hosted datasets, find data files, and analyze data files in workspace environments which support a variety of different analysis modalities.

The hosted data for this release includes TOPMed multi-sample VCF data for ~55,000 sequenced participants within 32 TOPMed studies included in Freeze 5b as well as CRAM files for those participants. In addition, this release includes raw phenotype files for participants in TOPMed studies, providing clinical information such as BMI and lipids levels. In some cases, these data are in different dbGaP accessions than the genomic data. The hosted data is stored in both Amazon Web Services and Google Cloud and users have the option to run computation on either cloud provider. To access the hosted TOPMed data on BioData Catalyst, users must have dbGaP approval. Please refer to the Data page on the BioData Catalyst website for more information.

For more in depth information please see the "List of significant new features" below.

List of significant new features

The following features in this release support primarily TOPMed researchers ranging in technical skills (both command-line and GUI) and with approval for the controlled TOPMed studies in dbGaP:

  • System login and data access: Researchers can log into the BioData Catalyst platforms using their eRA Commons ID. Approvals for TOPMed studies in dbGaP are recognized by the platforms.

  • Search TOPMed phenotypic data: Create cohorts on PIC-SURE by searching and selecting phenotypic variables of interest from dbGaP and then export cohorts to Seven Bridges or Terra for use in analysis workspaces. Users can also explore the TOPMed phenotype variables harmonized by the TOPMed Data Coordinating Center.

  • Find and access TOPMed genomics files, raw phenotype data files, and reference data files: Use the Explorer feature on Gen3 and the Data Browser feature on Seven Bridges.

  • Bring your own data: Use one of several options to upload/import data files to the workspace environments.

  • Run analyses at Scale: Analyze thousands of samples at once using batch processing capabilities in secure workspaces. Ability to run computation on Google Cloud and Amazon Web Services. Utilize visual user interface, Jupyterlab Notebooks and Jupyter Notebooks, RStudio, API, and command line.

  • Association studies: Execute single variant and multiple variant association studies utilizing the GENESIS pipelines, Hail, and others. Utilize Annotation Explorer to create variant grouping files for multiple variant association studies.

  • Collaborate with other users: Share workspaces, files, and tools with other BioData Catalyst users.

  • Documentation: Access documentation for each of the platforms.

  • Track cloud costs: Track cloud storage and compute costs on Seven Bridges and Terra.

Data Releases

Hosted TOPMed study accessions with genomic data from Freeze 5b

Study Name

Acronym

phs I.D. #

NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish

Amish

phs000956

NHLBI TOPMed: Atherosclerosis Risk in Communities

ARIC

phs001211

NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados

BAGS

phs001143

NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation Study

CCAF

phs001189

NHLBI TOPMed: The Cleveland Family Study

CFS

phs000954

NHLBI TOPMed: Cardiovascular Health Study

CHS

phs001368

NHLBI TOPMed: Genetic Epidemiology of COPD

COPDGene

phs000951

NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica

CRA

phs000988

NHLBI TOPMed: Diabetes Heart Study

DHS

phs001412

NHLBI TOPMed: Boston Early-Onset COPD Study

EOCOPD

phs000946

NHLBI TOPMed: Framingham Heart Study

FHS

phs000974

NHLBI TOPMed: Genes-Environments and Admixture in Latino Asthmatics

GALAII

phs000920

NHLBI TOPMed: Genetic Study of Atherosclerosis Risk

GeneSTAR

phs001218

NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy

GENOA

phs001345

NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity

GenSalt

phs001217

NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate

GOLDN

phs001359

NHLBI TOPMed: Heart and Vascular Health Study

HVH

phs000993

NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy

HyperGEN

phs001293

NHLBI TOPMed: The Jackson Heart Study

JHS

phs000964

NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism

Mayo_VTE

phs001402

NHLBI TOPMed: The Multi-Ethnic Study of Atherosclerosis

MESA

phs001416

NHLBI TOPMed: Massachusetts General Hospital (MGH) Atrial Fibrillation Study

MGH_AF

phs001062

NHLBI TOPMed: Partners HealthCare Biobank

Partners

phs001024

NHLBI TOPMed: San Antonio Family Heart Study

SAFS

phs001215

NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment

SAGE

phs000921

NHLBI TOPMed: African American Sarcoidosis Genetics Resource

Sarcoidosis

phs001207

NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans

SAS

phs000972

NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese

THRV

phs001387

NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry

VAFAR

phs000997

NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry

VU_AF

phs001032

NHLBI TOPMed: The Women's Genome Health Study

WGHS

phs001040

NHLBI TOPMed: Women's Health Initiative

WHI

phs001237

Hosted TOPMed study accessions with phenotype data

Study Name

Acronym

phs I.D. #

Atherosclerosis Risk in Communities

ARIC

phs000280

Cleveland Clinic Atrial Fibrillation Study

CCAF

phs000820

The Cleveland Family Study

CFS

phs000284

Cardiovascular Health Study

CHS

phs000287

Genetic Epidemiology of COPD

COPDGene

phs000179

Framingham Heart Study

FHS

phs000007

Genes-Environments and Admixture in Latino Asthmatics

GALAII

phs001180

Genetic Study of Atherosclerosis Risk

GENESTAR

phs001074

Genetic Epidemiology Network of Arteriopathy

GENOA

phs001238

Genetic Epidemiology Network of Salt Sensitivity

GENSALT

phs000784

Heart and Vascular Health Study

HVH

phs001013

The Jackson Heart Study

JHS

phs000286

The Multi-Ethnic Study of Atherosclerosis

MESA

phs000209

Massachusetts General Hospital (MGH) Atrial Fibrillation Study

MGH_AF

phs001001

Women's Health Initiative

WHI

phs000200

Information on the status of data releases is forthcoming.

For detailed platform release notes please consult the following resources: