arrow-left

All pages
gitbookPowered by GitBook
1 of 24

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes

hashtag
Introduction

The 2025-01-15 release marks the 20th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., storage cost savings and PFB handoff of cohort data) along with documentation and tutorials (e.g., video guides to BDC Data Studio environments) to help new users get started on the system. This release also includes enhanced support for working with cohort data. Please find more detail on the new features and user support materials in the sections below.

The 2025-01-15 data releases include the addition of studies on pulmonary fibrosis, COPD, asthma, and congenital heart defects, along with new imaging from atherosclerosis and echocardiogram studies. Updates also include research on cardiovascular health, genetic epidemiology, COVID-19, blood pressure, veteran health, and lifestyle interventions. Please refer to the Data Releases section below for more information as well as the Data pagearrow-up-right on the BDC website.

hashtag
Significant new features

Save on storage costs in Terra with bucket lifecycle rules: This feature on BDC Powered by Terra (BDC-Terra) gives users better controls to delete unnecessary workspace bucket files and manage cloud storage costs.

PFB Handoff of Cohort Data from PIC-SURE to Terra: After exploring data and adding filters to build a cohort of interest in BDC Powered by PIC-SURE (BDC-PIC-SURE), investigators can now seamlessly move the participant-level data to BDC-Terra for analysis. This feature allows investigators to bring the data into a new or previously existing BDC-Terra workspace using the Portable Format for Bioinformatics, or PFB, format. This format includes two tables: the participant-level data and the associated data dictionary. and at .

Links to Original Files from Selected Cohort Data: The selected participant-level data from BDC-PIC-SURE is now connected back to the original data file. The data is connected using DRS URIs, a GA4GH standard used to allow access to data in a single, standard way. This allows investigators to refer back to the original source of the BDC-PIC-SURE data. This feature is currently available in the data dictionary table with the PFB formatted BDC-PIC-SURE data. Note: This is currently available for some studies, but the DRS URIs of other studies are being added regularly.

Connect Cohort Data to Genomic Information via Sample Identifiers: Investigators can automatically include sample identifiers when preparing selected cohort data for analysis in BDC-PIC-SURE. The sample identifiers allow researchers to connect the phenotypic information to the associated genomic data or other sample types.

Explore Data with Social Determinants of Health (SDOH) Gravity Domains: Several variables from BDC data have been mapped to SDOH domains from the Gravity Project, a collaborative public-private initiative with the goal of developing consensus-driven data standards to support the collection, use, and exchange of data to address SDOH. These mappings can be used to explore the data in BDC-PIC-SURE.

hashtag
New user support materials and documentation

Video guides to BDC Data Studio Environments: Three new onboarding videos were created to introduce and orient users to the three kinds of Data Studio environments available on BDC: JupyterLab, RStudio, and SAS Studio. These videos are available on the as platform-generated videos.

hashtag
Data Releases

The table below highlights which studies were included in the 2025-01-15 data release.

The latest release features NHLBI TOPMed projects, including the San Antonio Family Heart Study (SAFHS), Women's Health Initiative (WHI), and the Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project, with updates from the Cardiovascular Health Study (CHS). New additions include the study of African Americans, Asthma, Genes, and Environment (SAGE), the Pulmonary Fibrosis Whole Genome Sequencing project, and the Genetic Epidemiology of COPD (COPDGene). Furthermore, the release highlights studies on the Molecular Genetics of Heterotaxy and Related Congenital Heart Defects, and the Collaborative Cohort of Cohorts for COVID-19 Research (C4R) with data from SPIROMICS and Jackson Heart Study (JHS). Featured are several BioLINCC studies, such as the Systolic Blood Pressure Intervention Trial (SPRINT), Heart Failure Network studies, and the Resuscitation Outcomes Consortium (ROC). This release introduces the Multi-Ethnic Study of Atherosclerosis (MESA) Echocardiogram Image Repository and includes data from the Veterans Administration (VA) Million Veteran Program (MVP) as well as the Healthy Lifestyle Program (HeLP).

The data is now available for access across the entire ecosystem.

Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version

hashtag
Planned upcoming Data Releases

Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version

hashtag
For detailed platform release notes please consult the following resources:

BDC Powered by Gen3 release notes

NHLBI TOPMed: Women's Health Initiative (WHI)

phs001237.v3.p1.c2

topmed-WHI_HMB-IRB-NPU

No

Yes

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

phs001368.v4.p2.c1

topmed-CHS_HMB-MDS

No

Yes

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

phs001368.v4.p2.c2

topmed-CHS_HMB-NPU-MDS

No

Yes

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

phs001368.v4.p2.c3

topmed-CHS_DS-CVD-MDS

Yes

No

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

phs001368.v4.p2.c4

topmed-CHS_DS-CVD-NPU-MDS

No

Yes

NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)

phs001402.v3.p1.c1

topmed-Mayo_VTE_GRU

No

Yes

NHLBI TOPMed: My Life Our Future (MLOF) Research Repository of Patients with Hemophilia A (Factor VIII Deficiency) or Hemophilia B (Factor IX Deficiency)

phs001515.v2.p2.c1

topmed-MLOF_HMB-PUB

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v3.p2.c1

topmed-IPF_DS-ILD-IRB-NPU

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v3.p2.c2

topmed-IPF_DS-LD-IRB-NPU

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v3.p2.c3

topmed-IPF_DS-PFIB-IRB-NPU

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v3.p2.c4

topmed-IPF_DS-PUL-ILD-IRB-NPU

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v3.p2.c5

topmed-IPF_HMB-IRB-NPU

No

Yes

TRanscriptomic ANalySis of left ventriCulaR gene Expression (TRANSCRibE)

phs001679.v1.p1.c1

heartfailure-TRANSCRibE_GRU

Yes

No

TRanscriptomic ANalySis of left ventriCulaR gene Expression (TRANSCRibE)

phs001679.v1.p1.c2

heartfailure-TRANSCRibE_DS-CI

Yes

No

NHLBI TOPMed: Pediatric Cardiac Genomics Consortium (PCGC)'s Congenital Heart Disease Biobank

phs001735.v2.p1.c1

topmed-PCGC_CHD_HMB

No

No

NHLBI TOPMed: Pediatric Cardiac Genomics Consortium (PCGC)'s Congenital Heart Disease Biobank

phs001735.v2.p1.c2

topmed-PCGC_CHD_DS-CHD

No

No

Molecular Genetics of Heterotaxy and Related Congenital Heart Defects

phs001814.v1.p1.c1

heartfailure-MolGen_CHD_GRU

Yes

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

phs002909.v1.p1.c1

COVID19-C4R_SPIROMICS_GRU

No

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

phs002909.v1.p1.c2

COVID19-C4R_SPIROMICS_GRU-NPU

No

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

phs002909.v1.p1.c3

COVID19-C4R_SPIROMICS_DS-COPD

No

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

phs002909.v1.p1.c4

COVID19-C4R_SPIROMICS_DS-COPD-NPU

No

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

phs002909.v1.p1.c5

COVID19-C4R_SPIROMICS_GRU-COL

No

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

phs002909.v1.p1.c6

COVID19-C4R_SPIROMICS_GRU-COL-NPU

No

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

phs002909.v1.p1.c7

COVID19-C4R_SPIROMICS_DS-COPD-COL

No

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

phs002909.v1.p1.c8

COVID19-C4R_SPIROMICS_DS-COPD-COL-NPU

No

No

Adult Observational Cohort Study (RC_Adult)

phs003463.v3.p2.c1

RECOVER-RC_Adult_GRU

No

Yes

Systolic Blood Pressure Intervention Trial (SPRINT-BioLINCC)

phs003483.v1.p1.c1

BioLINCC-BL_SPRINT_GRU

Yes

No

Surgical Treatment for Ischemic Heart Failure (STICH-BioLINCC)

phs003493.v1.p1.c1

BioLINCC-BL_STICH_GRU

Yes

No

Heart Failure Network Aldosterone Targeted Neurohormonal Combined with Natriuresis Therapy - (HFN ATHENA-BioLINCC)

phs003506.v1.p1.c1

BioLINCC-BL_HFN_ATHENA_GRU

Yes

No

Heart Failure Network - Effectiveness of Ultrafiltration in Treating People with Acute Decompensated Heart Failure and Cardiorenal Syndrome (HFN CARRESS - BioLINCC)

phs003510.v1.p1.c1

BioLINCC-BL_HFN_CARRESS_GRU

Yes

No

Sickle Cell Disease Natural History Data Resource (SCD NHDR)

phs003529.v1.p1.c1

CureSC-SCD_NHDR_GRU-IRB

No

No

Heart Failure Network - Nitrate's Effect on Activity Tolerance in Heart Failure with Preserved Ejection Fraction (HFN NEAT-BioLINCC)

phs003548.v1.p1.c1

BioLINCC-BL_HFN-NEAT_GRU

Yes

No

Heart Failure Network - Phosphodiesterase-5 Inhibition to Improve Clinical Status and Exercise Capacity in Diastolic Heart Failure (HFN RELAX-BioLINCC)

phs003565.v1.p1.c1

BioLINCC-BL_HFN-RELAX_GRU

Yes

No

Heart Failure Network - Renal Optimization Strategies Evaluation in Acute Heart Failure and Reliable Evaluation of Dyspnea (HFN ROSE-BioLINCC)

phs003589.v1.p1.c1

BioLINCC-BL_HFN-ROSE_GRU

Yes

No

Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION-BioLINCC)

phs003599.v1.p1.c1

BioLINCC-BL_HF-ACTION_HMB

Yes

No

Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION-BioLINCC)

phs003599.v1.p1.c2

BioLINCC-BL_HF-ACTION_HMB-NPU

Yes

No

Heart Failure Network: Inorganic Nitrite Delivery to Improve Exercise Capacity in HFpEF (HFN INDIE-BioLINCC)

phs003667.v1.p1.c1

BioLINCC-BL_HFN-INDIE_GRU

Yes

No

CONNECTS Master Protocol for Clinical Trials targeting Macro- and Micro-Immuno-Thrombosis, Vascular Hyperinflammation, and Hypercoagulability and Renin-Angiotensin-Aldosterone System (RAAS) in Hospitalized Patients with COVID-19 (ACTIV-4 Host Tissue)

phs003708.v1.p1.c1

COVID19-ACTIV4_HostTissue_GRU

Yes

No

Acute Respiratory Distress Network (ARDSNet) Study 04 Assessment of Low Tidal Volume and Elevated End-Expiratory Volume to Obviate Lung Injury (ALVEOLI-BioLINCC)

phs003714.v1.p1.c1

BioLINCC-BL_ARDSNet_ALVEOLI_GRU

Yes

No

Resuscitation Outcomes Consortium (ROC) Cardiac Epidemiologic Registry (Cardiac Epistry) Version 3 (ROC-Cardiac Epistry 3-BioLINCC)

phs003726.v1.p1.c1

BioLINCC-BL_ROC_Cardiac_Epistry_3_GRU

Yes

No

Beta-Blocker Evaluation in Survival Trial (BEST-BioLINCC)

phs003730.v1.p1.c1

BioLINCC-BL_BEST_GRU

Yes

No

Acute Respiratory Distress Network (ARDSNet) Studies 06 and 08 Prospective, Randomized, Multicenter Trial of Aerosolized Albuterol Versus Placebo for the Treatment of Acute Lung Injury (ALTA) (ARDSNet-ALTA-BioLINCC)

phs003743.v1.p1.c1

BioLINCC-BL_ARDSNet_ALTA_HMB-MDS

Yes

No

NHLBI TOPMed: Coronary Artery Risk Development in Young Adults (CARDIA)

phs001612.v3.p3.c1

topmed-CARDIA_HMB-IRB

No

Yes

NHLBI TOPMed: Coronary Artery Risk Development in Young Adults (CARDIA)

phs001612.v3.p3.c2

topmed-CARDIA_HMB-IRB-NPU

No

Yes

NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene)

phs000951.v6.p5.c2

topmed-COPDGene_DS-CS-RD

No

Yes

NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene)

phs000951.v6.p5.c1

topmed-COPDGene_HMB

No

Yes

NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica

phs000988.v6.p1.c1

topmed-CRA_DS-ASTHMA-IRB-MDS-RD

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v4.p3.c1

topmed-IPF_DS-ILD-IRB-NPU

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v4.p3.c2

topmed-IPF_DS-LD-IRB-NPU

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v4.p3.c3

topmed-IPF_DS-PFIB-IRB-NPU

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v4.p3.c4

topmed-IPF_DS-PUL-ILD-IRB-NPU

No

Yes

NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

phs001607.v4.p3.c5

topmed-IPF_HMB-IRB-NPU

No

Yes

NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment (SAGE)

phs000921.v5.p2.c2

topmed-SAGE_DS-LD-IRB-COL

No

Yes

NHLBI TOPMed: Women's Health Initiative (WHI)

phs001237.v4.p2.c1

topmed-WHI_HMB-IRB

No

Yes

NHLBI TOPMed: Women's Health Initiative (WHI)

phs001237.v4.p2.c2

topmed-WHI_HMB-IRB-NPU

No

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)

phs003017.v1.p1.c1

COVID19-C4R_MESA_HMB

No

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)

phs003017.v1.p1.c2

COVID19-C4R_MESA_HMB-NPU

No

No

Acute Respiratory Distress Network (ARDSNet) Studies 01 and 03 Lower Versus Higher Tidal Volume, Ketoconazole Treatment and Lisofylline Treatment (ARMA/KARMA/LARMA) (ARDSNet-ARMA/KARMA/LARMA-BioLINCC)

phs003734.v1.p1.c1

BioLINCC-BL_ARDSNet_ARMA_KARMA_LARMA_GRU

Yes

No

ARDSNet 07-08: Randomized, Blinded, Placebo-Controlled, Multi-Center Trial of Omega-3 Fatty Acid, Gamma-Linolenic Acid, and Antioxidants in Acute Lung Injury or ARDS (OMEGA) (ARDSNet-Omega-BioLINCC)

phs003744.v1.p1.c1

BioLINCC-BL_ARDSNet_Omega_HMB-MDS

Yes

No

Acute Respiratory Distress Network (ARDSNet) Studies 10 and 12 Statins for Acutely Injured Lungs from Sepsis (SAILS) (ARDSNet-SAILS-BioLINCC)

phs003736.v1.p1.c1

BioLINCC-BL_ARDSNet_SAILS_HMB-MDS

Yes

No

Prevention and Early Treatment of Acute Lung Injury (PETAL) - Low Tidal Volume Universal Support Feasibility of Recruitment for Interventional Trial (LOTUS FRUIT) (PETAL-LOTUS FRUIT-BioLINCC)

phs003791.v1.p1.c1

BioLINCC-BL_PETAL_LOTUS_FRUIT_GRU

Yes

No

Resuscitation Outcomes Consortium (ROC) Amiodarone, Lidocaine or Neither for Out-Of-Hospital Cardiac Arrest Due to Ventricular Fibrillation or Ventricular Tachycardia (ALPS)

phs003784.v1.p1.c1

BioLINCC-BL_ROC_ALPS_GRU

Yes

No

Resuscitation Outcomes Consortium (ROC) Cardiac Epidemiologic Registry (Cardiac Epistry) Versions 1 and 2 (ROC-Cardiac Epistry 1 and 2-BioLINCC)

phs003803.v1.p1.c1

BioLINCC-BL_ROC_Cardiac_Epistry_1_2_GRU

Yes

No

Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT-BioLINCC)

phs003665.v1.p1.c1

BioLINCC-BL_TOPCAT_HMB-MDS

Yes

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Jackson Heart Study (JHS)

phs002907.v1.p1.c4

COVID19-C4R_JHS_DS-FDO-IRB

Yes

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Jackson Heart Study (JHS)

phs002907.v1.p1.c2

COVID19-C4R_JHS_DS-FDO-NPU-IRB

Yes

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Jackson Heart Study (JHS)

phs002907.v1.p1.c3

COVID19-C4R_JHS_HMB-IRB

Yes

No

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Jackson Heart Study (JHS)

phs002907.v1.p1.c1

COVID19-C4R_JHS_HMB-NPU-IRB

Yes

No

Veterans Administration (VA) Million Veteran Program (MVP) Summary Results from Omics Studies

phs001672.v11.p1.c1

dbGaP-MVP_HMB-MDS

Yes

No

Multi-Ethnic Study of Atherosclerosis (Echocardiogram Image Repository)

phs003702.v1.p1.c1

imaging-img_MESA_ECHO_HMB

Yes

No

Multi-Ethnic Study of Atherosclerosis (Echocardiogram Image Repository)

phs003702.v1.p1.c2

imaging-img_MESA_ECHO_HMB-NPU

Yes

No

Incentives and Case Management to Improve Cardiac Care: Healthy Lifestyle Program (HeLP)

phs003737.v1.p1.c1

Individual_Study-UTMB_HeLP_GRU

Yes

No

BioLINCC The Women's Health Initiative (WHI)

phs003824.v1.c2

imaging-img_WHI_HMB-NPU

Yes

No

The Jackson Heart Study (JHS)

phs003747.v1.p1.c1

imaging-img_JHS_HMB-IRB-NPU

Yes

No

The Jackson Heart Study (JHS)

phs003747.v1.p1.c2

imaging-img_JHS_DS-FDO-IRB-NPU

Yes

No

The Jackson Heart Study (JHS)

phs003747.v1.p1.c3

imaging-img_JHS_HMB-IRB

Yes

No

The Jackson Heart Study (JHS)

phs003747.v1.p1.c4

imaging-img_JHS_DS-FDO-IRB

Yes

No

Resuscitation Outcomes Consortium (ROC) Hypertonic Saline (HS) Trial Shock Study and Traumatic Brain Injury Study (TBI) (ROC-HS/TBI-BioLINCC)

phs003777.v1.p1.c1

BioLINCC-BL_ROC_HS_TBI-GRU

Yes

No

NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study

phs000974.v6.p5.c1

topmed-FHS_HMB-IRB-MDS

No

Yes

NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study

phs000974.v6.p5.c2

topmed-FHS_HMB-IRB-NPU-MDS

No

Yes

NHLBI TOPMed: MESA and MESA Family AA-CAC

phs001416.v4.p1.c1

topmed-MESA_HMB

No

Yes

NHLBI TOPMed: MESA and MESA Family AA-CAC

phs001416.v4.p1.c2

topmed-MESA_HMB-NPU

No

Yes

NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)

phs000920.v6.p4.c2

topmed-GALAII_DS-LD-IRB-COL

No

Yes

NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)

phs001211.v5.p4.c1

topmed-ARIC_HMB-IRB-NPU-MDS

No

Yes

NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)

phs001211.v5.p4.c2

topmed-ARIC_DS-CVD-IRB-NPU-MDS

No

Yes

NHLBI TOPMed: San Antonio Family Heart Study (SAFHS)

phs001215.v4.p2.c1

topmed-SAFHS_DS-DHD-IRB-PUB-MDS-RD

No

Yes

NHLBI TOPMed: Women's Health Initiative (WHI)

phs001237.v3.p1.c1

topmed-WHI_HMB-IRB

No

Yes

Resuscitation Outcomes Consortium (ROC) Trauma Epidemiologic Registry (Trauma Epistry) (ROC-Trauma Epistry-BioLINCC)

phs003809.v1.p1.c1

BioLINCC-BL_ROC-Trauma_Epistry_GRU

Yes

No

BioLINCC The Women's Health Initiative (WHI)

phs003824.v1.c1

imaging-img_WHI_HMB

Yes

No

Learn more about bucket lifecycle rules here.arrow-up-right
Learn more about handing off participant data from BDC-PIC-SURE to BDC-Terra herearrow-up-right
BDC-Terra Supportarrow-up-right
Learn more about including sample identifiers here.arrow-up-right
Velsera YouTube channelarrow-up-right
BDC Powered by Terra release notesarrow-up-right
BDC Powered by Seven Bridges release notesarrow-up-right
BDC Powered by PIC-SURE release notesarrow-up-right

2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes

hashtag
Introduction

The 2023-10-04 release marks the 15th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., the ability to view cohort variables prior to access, and the ability to export selected data into an analysis workspace). Please find more detail on the new features in the section below.

The 2023-10-04 data releases include the addition of TOPMed studies spanning early-onset COPD, heart studies from various geographies, diabetes heart studies, and more. CRAMs and unharmonized clinical files were updated for six TOPMed studies already in BDC. BioLINCC Multi-Ethnic Study of Atherosclerosis studies were also added. Please refer to the Data Releases section below for more information as well as the on the BDC website.

2023-04-04 BioData Catalyst Ecosystem Release Notes

hashtag
Introduction

The 2023-04-04 release marks the thirteenth release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features, e.g., a new gallery for Public Projects and new project-based download restrictions on BDC Powered by Seven Bridges (BDC-Seven Bridges). It also includes documentation and tutorials to help new users get started on the system, e.g., how to start using the BDC Powered by PIC-SURE (BDC-PIC-SURE) API. Please find more details on the new features and user support materials in the sections below.

Please refer to the Data Releases section below for information on upcoming data releases. A list of currently available data can be viewed on the

hashtag
Significant new features

BDC Powered by PIC-SURE (BDC-PIC-SURE): Open Access Variable Distributions Tool: Researchers can now view the variable distributions for their selected cohort with BDC-PIC-SURE Open Access to further their data discovery and exploration prior to access. Once variable filters have been applied, the Variable Distributions Tool displays bar charts for categorical variables and histograms for continuous variables. Note that the visualizations are obfuscated to protect participant-level data.

BDC Powered by Seven Bridges (BDC-Seven Bridges): Data Export from the BDC-PIC-SURE UI Public Projectarrow-up-right: This public project enables users to use a CWL tool to export selected data from BDC-PIC-SURE into a BDC-Seven Bridges project using a query from the BDC-PIC-SURE UI and the BDC-PIC-SURE API. This project is a continuation of our original BDC-PIC-SURE API Public Projectarrow-up-right. Combined, these public projects give savvy and novice users the ability to transfer and make cohorts on BDC-PIC-SURE and bring data frames over to BDC-Seven Bridges for analysis.

hashtag
Known issues and workarounds

BDC Powered by Terra (BDC-Terra) workspace data security: When users import data from NIH data repositories such as BDC, they are only allowed to import into existing BDC-Terra workspaces that have an authorization domain and/or protected data setting. Import of these datasets into unprotected workspaces will not succeed. This ensures that the data access is appropriately logged by BDC-Terra.

hashtag
Data Releases

The table below highlights which studies were included in the 2023-10-04 data release. This release includes a significant representation from the NHLBI TOPMed program with studies spanning areas such as early-onset COPD, heart studies from various geographies, diabetes heart studies, and more. Notably, CRAMs and unharmonized clinical files have been updated for 6 TOPMed studies that were already a part of BDC. Additionally, new studies pertaining to the BioLINCC Multi-Ethnic Study of Atherosclerosis have been introduced. The data is now available for access across the entire ecosystem.

Study Name

phs I.D. #

Acronym

New to BioData Catalyst

New study version

NHLBI TOPMed: Boston Early-Onset COPD Study (EOCOPD)

phs000946.v5.p1.c1

topmed-EOCOPD_DS-CS-RD

No

No

NHLBI TOPMed: The Cleveland Family Study (CFS)

phs000954.v4.p2.c1

topmed-CFS_DS-HLBS-IRB-NPU

No

hashtag
Planned Upcoming Data Releases

Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (CHS)

phs001368.v3.p2.c3

topmed-CHS_DS-NPU-MDS

Yes

Yes

NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica

phs000988.v5.p1.c1

topmed-CRA_DS-ASTHMA-IRB-MDS-RD

hashtag
For detailed platform release notes please consult the following resources:

BDC-Gen3 release notes BDC-Terra release notesarrow-up-right BDC-Seven Bridges release notesarrow-up-right BDC-PIC-SURE release notesarrow-up-right

Data pagearrow-up-right
of the BDC website.

hashtag
Significant new features

New gallery for Public Projects on BDC-Seven Bridges: BDC-Seven Bridges has released a new user interface to make browsing and selecting public projects easier. Previously, Public Projects were found as a list under a dropdown menu. The interface has been updated where the Public Resources > Projects dropdown displays a gallery of project cards with summaries and easily clickable “Copy Project” buttons.

Project-based download restrictions on BDC-Seven Bridges: Many consortia have found value in using the BDC-Seven Bridges project member permissions to collaborate and distribute data prior to public release. However, the ability to add new files to a project also allows a user to download files to their local environment. BDC-Seven Bridges released a new feature providing project-based download restrictions to the owner of the project. When creating a project, a user can turn on Download Restrictions and select to either allow analysis (CWL tools/workflows or Data Studio) but no download to a local environment, or no analysis and no download to the local environment. To request access to the new feature, email support@sevenbridges.comenvelope.

New CWL tools and workflows on BDC-Seven Bridges:

  • Minimac 4 4.1.2: a tool for imputing genotypes.

  • GATK 4.4.0.0

    • GATK IndexFeatureFile for indexing of provided feature files.

    • GATK MergeVcfs for combining multiple variant files.

    • GATK VariantEval BETA for evaluating variant calls.

    • GATK FilterMutectCalls filter somatic SNVs and indels called by Mutect2.

  • HTSeq-count 2.0.2: HTSeq-count is a Python tool for counting how many reads map to each feature.

  • GraphicsMagick 1.3.38

    • GraphicsMagick compare compares two images using statistics and/or visual differencing. The tool compares two images and reports difference statistics according to specified metrics, and/or outputs an image with a visual representation of the differences.

    • GraphicsMagick composite composites (combines) images to create a new image.

  • MHC-I Binding Prediction tool (MHC I 3.1.2 toolkit) - which is used for prediction of peptides that bind to MHC I molecules.

  • MHC-II Binding Prediction tool (MHC II 3.1.6 toolkit) - which is used for prediction of peptides that bind to MHC II molecules.

  • MHCflurry Predict tool (MHCflurry 2.0.4 toolkit) - which is used for peptide/MHC I binding affinity prediction.

  • MHCflurry Scan tool (MHCflurry 2.0.4 toolkit) - which is designed to scan protein sequences and predict MHC-I ligands.

  • AXEL-F: Antigen eXpression based Epitope Likelihood-Function tool (AXEL-F 1.0.0 toolkit) - which is used for MHC-I epitope prediction.

  • NetChop tool (NetChop 3.0 toolkit) - which is a predictor of proteasomal processing based upon a neural network.

  • NetCTL tool (NetCTL 3.0 toolkit) - which is a T cell epitopes predictor.

  • NetCTLpan tool (NetCTLpan 3.0 toolkit) - which is a T cell epitopes predictor.

  • Class I Immunogenicity tool (Class I Immunogenicity 3.0 toolkit) - which predicts the immunogenicity of a peptide MHC (pMHC) complex.

  • TCRMatch tool (TCRMatch 1.0.2 toolkit) - which predicts T-Cell receptor specificity based on sequence similarity to characterized receptors.

  • BCell tool (BCell 3.1 toolkit) - which predicts linear B cell epitopes based on the antigen characteristics.

  • ElliPro tool (ElliPro 1.0 toolkit) - which predicts antibody epitopes based upon solvent-accessibility and flexibility.

  • Population Coverage tool (Population Coverage 3.0 toolkit) - which calculates the fraction of individuals predicted to respond to a given set of epitopes.

  • Epitope Cluster Analysis tool (Epitope Cluster Analysis 1.0 toolkit) - which groups epitopes into clusters based on sequence identity.

  • Picard 3.0.0 toolkit:

    • Picard CollectMultipleMetrics collects BAM statistics by running multiple Picard modules at once.

    • Picard ValidateSamFile validates an alignments file against the SAM specification.

  • MetaCyto workflow (1.16.0 in CWL 1.2): based on R package MetaCyto that performs meta-analysis of both flow cytometry and mass cytometry (CyTOF) data. It is able to jointly analyze cytometry data from different studies with diverse sets of markers.

New and improved R adapter for BDC-PIC-SURE API: The R adapter for the BDC-PIC-SURE API has been completely revamped to improve performance, address known bugs, and make the API easier to use for R coders. All example code, in both Jupyter and RStudio, has been updated to show these code improvements in practice. Note: The old version of the R API will be available for use until August 31st, 2023. It is recommended that you update your code with the new changes.

BDC Powered by Gen3 (BDC-Gen3) Metadata Being Updated to bring data from dbGaP FHIR database: BDC-Gen3’s Discovery Page (and underlying BDC-Gen3 Source of Truth Metadata API) allows unauthenticated users to discover what datasets are available in BDC. Fast Health Interoperability Resources (FHIR) is an Health Level Seven International (HL7) specification for Healthcare Interoperability. The database of Genotypes and Phenotypes (dbGaP) has recently exposed a FHIR serverarrow-up-right. BDC-Gen3 has worked to consume the new metadata from the dbGaP FHIR Server (as part of the officially defined data ingestion process). BDC-Gen3’s Python-based Software Development Kit (SDK) and Command Line Interface (CLI) now has:

  • A FHIR client

  • Direct interaction with dbGaP’s FHIR API

  • Extract, Transform, Load (ETL) logic to parse the content from dbGaP’s FHIR and load into BDC-Gen3’s Metadata API

BDC-Gen3’s Data Ingestion Pipeline will be updated to use the above tool to load FHIR metadata every new data release. In April 2023, loaded metadata will be available to all clients/users through BDC-Gen3’s Metadata API, and loaded metadata will be viewable in BDC-Gen3’s Discovery Page.

hashtag
New user support materials and documentation

Learn about and start using the BDC-PIC-SURE API on the new “API” page: The “API” page on the BDC-PIC-SURE website provides everything you need to get started with the BDC-PIC-SURE API. This includes the personalized access token, links to publicly available R and Python code on both BDC Powered by Seven Bridges and Powered by Terra, and links to additional documentation.

hashtag
Data Releases

In Q1 2023, progress was made in establishing procedures, clarifying data submission, and reworking screening protocols for multiple datasets for use with upcoming dataset ingestion. This included collaborative efforts with NHLBI to support pre-ingestion quality assurance, as well as data support for screening and assisting data submitters in preparing their data for future ingestion into BDC. Key datasets that underwent these processes include nuMoM2b (phs002808.v1.p1.c1), BABY HUG (phs002415.v1.p1.c1), MSH (phs002348.v1.p1.c1), NSRR-CFS (phs002715.v1.p1.c1), and CRA (phs000988.v4.p1.c1).

hashtag
Planned Upcoming Data Releases

Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version

Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b)

phs002808.v1.p1.c1

topmed-NuMom2B_GRU-IRB

Yes

Yes

Hydroxyurea to Prevent Organ Damage in Children with Sickle Cell Anemia (BABY HUG)

phs002415.v1.p1.c1

BioLINCC-BabyHug_DS-SCD-IRB-RD

hashtag
For detailed platform release notes please consult the following resources:

Gen3 release notes Terra release notesarrow-up-right Seven Bridges release notesarrow-up-right PIC-SURE release notesarrow-up-right Dockstore release notesarrow-up-right

Data pagearrow-up-right

No

NHLBI TOPMed: The Jackson Heart Study (JHS)

phs000964.v5.p1.c1

topmed-JHS_HMB-IRB-NPU

No

No

NHLBI TOPMed: The Jackson Heart Study (JHS)

phs000964.v5.p1.c2

topmed-JHS_DS-FDO-IRB-NPU

No

No

NHLBI TOPMed: The Jackson Heart Study (JHS)

phs000964.v5.p1.c3

topmed-JHS_HMB-IRB

No

No

NHLBI TOPMed: The Jackson Heart Study (JHS)

phs000964.v5.p1.c4

topmed-JHS_DS-FDO-IRB

No

Yes

NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study (FHS)

phs000974.v5.p3.c1

topmed-FHS_HMB-IRB-MDS

No

No

NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study (FHS)

phs000974.v5.p3.c2

topmed-FHS_HMB-IRB-NPU-MDS

No

No

NHLBI TOPMed: Heart and Vascular Health Study (HVH)

phs000993.v5.p2.c1

topmed-HVH_HMB-IRB-MDS

No

No

NHLBI TOPMed: Heart and Vascular Health Study (HVH)

phs000993.v5.p2.c2

topmed-HVH_DS-CVD-IRB-MDS

No

No

NHLBI TOPMed - NHGRI CCDG: The Vanderbilt AF Ablation Registry

phs000997.v5.p2.c1

topmed-VAFAR_HMB-IRB

No

No

NHLBI TOPMed: Heart and Vascular Health Study (HVH)

phs001032.v6.p2.c1

topmed-VU_AF_GRU-IRB

No

No

NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados

phs001143.v4.p1.c1

topmed-BAGS_GRU-IRB

No

No

NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation (CCAF) Study

phs001189.v4.p1.c1

topmed-CCAF_AF_GRU-IRB

No

No

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (CHS)

phs001368.v3.p2.c1

topmed-CHS_HMB-MDS

No

No

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (CHS)

phs001368.v3.p2.c2

topmed-CHS_HMB-NPU-MDS

No

No

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (CHS)

phs001368.v3.p2.c4

topmed-CHS_DS-CVD-NPU-MDS

No

No

NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AACAC)

phs001412.v3.p1.c1

topmed-AACAC_HMB-IRB-COL-NPU

No

No

NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AACAC)

phs001412.v3.p1.c2

topmed-AACAC_DS-DHD-IRB-COL-NPU

No

No

NHLBI TOPMed: MESA and MESA Family AA-CAC (MESA)

phs001416.v3.p1.c1

topmed-MESA_HMB

No

No

NHLBI TOPMed: MESA and MESA Family AA-CAC (MESA)

phs001416.v3.p1.c2

topmed-MESA_HMB-NPU

No

No

Clinical-trial of COVID-19 Convalescent Plasma in Outpatients (C3PO)

phs002752.v1.p1.c1

COVID19-C3PO_GRU

No

No

COVID-19 Post-hospital Thrombosis Prevention Study (ACTIV-4C)

phs003063.v1.p1.c1

COVID19-ACTIV4C_GRU

No

No

Multi-Ethnic Study of Atherosclerosis (BioLINCC)

phs003288.v1.p1.c1

BioLINCC-MESA_HMB

Yes

Yes

Multi-Ethnic Study of Atherosclerosis (BioLINCC)

phs003288.v1.p1.c2

BioLINCC-MESA_HMB-NPU

Yes

Yes

RECOVER Synthetic Data Set

tutorial-RECOVER_synthetic_data_set_1

tutorial-RECOVER_synthetic_data_set_1

Yes

Yes

No

Yes

NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)

phs000920.v5.p3.c2

topmed-GALAII_DS-LD-IRB-COL

No

Yes

NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy

phs001293.v3.p1.c2

topmed-HyperGEN_DS-CVD-IRB-RD

No

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)

phs002910.v1.p1.c1

C4R-COPDGene_HMB

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)

phs002910.v1.p1.c2

C4R-COPDGene_DS-CS

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Atherosclerosis Risk in Communities Study (ARIC)

phs002988.v1.p1.c1

C4R-ARIC_HMB-IRB

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Atherosclerosis Risk in Communities Study (ARIC)

phs002988.v1.p1.c2

C4R-ARIC_DS-CVD-IRB

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)

phs002913.v1.p1.c1

C4R-SARP_GRU-PUB-NPU

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)

phs002913.v1.p1.c2

C4R-SARP_GRU-PUB

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)

phs002913.v1.p1.c3

C4R-SARP_DS-AAI-PUB-NPU

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)

phs002913.v1.p1.c4

C4R-SARP_DS-AAI-PUB

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Framingham Heart Study (FHS)

phs002911.v1.p1.c1

C4R-FHS_HMB-IRB-MDS

Yes

Yes

Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Framingham Heart Study (FHS)

phs002911.v1.p1.c2

C4R-FHS_HMB-IRB-NPU-MDS

Yes

Yes

ApoA-1 and Atherosclerosis in Psoriasis (DIR)

phs003231.v1.p1.c1

DIR-AAP_GRU

Yes

Yes

Method to Assess Lung Water Accumulation During Exercise (DIR)

phs003346.v1.p1.c1

DIR-MALWADE_GRU-IRB

Yes

Yes

GraphicsMagick conjure interprets and executes scripts in the Magick Scripting Language (MSL). The Magick scripting language (MSL) will primarily benefit those that want to accomplish custom image processing tasks but do not wish to program.

  • GraphicsMagick convert is used to convert an input image file using one image format to an output file with the same or different image format while applying an arbitrary number of image transformations.

  • GraphicsMagick montage creates a composite image by combining several separate images.

  • Picard SortSam sorts alignment files (BAM or SAM).

  • Picard RevertSam reverts a BAM/SAM file to a previous state.

  • Picard MarkDuplicates marks duplicate reads in alignment files.

  • Picard GenotypeConcordance calculates genotype concordance between two VCF files.

  • Picard GatherBamFiles merges BAM files after a scattered analysis.

  • Picard FixMateInformation verifies and fixes mate-pair information.

  • Picard FastqToSam converts FASTQ files to an unaligned SAM or BAM file.

  • Picard CrosscheckFingerprints checks a set of data files for sample identity.

  • Picard CreateSequenceDictionary creates a DICT index file for a sequence.

  • Picard CollectWgsMetricsWithNonZeroCoverage evaluates the coverage and performance of WGS experiments.

  • Picard CollectVariantCallingMetrics can be used to collect variant call statistics after variant calling.

  • Picard CollectSequencingArtifactMetrics collects metrics to quantify single-base sequencing artifacts.

  • Picard CollectHsMetrics collects hybrid-selection metrics for alignments in SAM or BAM format.

  • Picard CollectAlignmentSummaryMetrics produces a summary of alignment metrics from a SAM or BAM file.

  • Picard CheckFingerprint checks sample identity of provided data against known genotypes.

  • Picard BedToIntervalList converts a BED file to a Picard INTERVAL_LIST format.

  • Picard AddOrReplaceReadGroups assigns all reads to the specified read group.

  • No

    No

    Multicenter Study of Hydroxyurea (MSH)

    phs002348.v1.p1.c1

    BioLINCC-MSH_GRU

    No

    No

    The Cleveland Family Study (NSRR-CFS)

    phs002715.v1.p1.c1

    NSRR-NSRR-CFS_DS-HLBS-IRB-NPU

    No

    No

    The Genetic Epidemiology of Asthma in Costa Rica (CRA)

    phs000988.v4.p1.c1

    topmed-CRA_DS-ASTHMA-IRB-MDS-RD

    No

    Yes

    Long-Term Outcomes after the Multisystem Inflammatory Syndrome In Children (MUSIC)

    phs002770

    -

    Yes

    Yes

    Accelerating COVID-19 Therapeutic Interventions and Vaccines 4 ACUTE (ACTIV4a) v1.0, v1.1

    phs002694.v1.p1.c1

    COVID19-ACTIV4A_GRU

    No

    Yes

    Molecular Atlas of Lung Development (LungMAP)

    phs001961.v2.p1.c1

    -

    Yes

    Yes

    Freeze 9 version Updates: Batch 1

    -

    -

    No

    Yes

    2021-04-02 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2021-04-02 release marks the fifth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., CWL tools for QC pipelines) along with documentation and tutorials to help new users get started on the system. This release also includes enhanced support for searching across documentation. Please find more detail on the new features and user support materials in the sections below.

    The 2021-04-02 data release includes updates of CRAMs and unharmonized clinical files for 6 TOPMed studies previously hosted on BioData Catalyst. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format.

    Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.

    hashtag
    Significant new features

    Documentation Search: BioData Catalyst users can now use to search across various types of documentation over the entire ecosystem. Favorite results can be saved in a folder and revisited later.

    CWL tools for QC pipelines: Users can now find the following CWL tools for quality control of GWAS data in the:

    • - This UW-GAC tool calculates heterozygosity by sample.

    • - This UW-GAC workflow checks expected relationships specified in a pedigree file against empirical kinship values from KING or PC-Relate.

    Import files from Kids First Data Resource Center: Users can now access datasets from the Kids First Data Resource portal directly from BioData Catalyst Powered by Seven Bridges using DRS links. Users must have dbGaP approvals for the Kids First datasets in order to access the dataset on BioData Catalyst. In addition, users can import DRS links from open access datasets available via DRS servers.

    PIC-SURE Data Access Dashboard: Users on PIC-SURE can now see a list of studies with data available in PIC-SURE. The Data Access Dashboard will show the study name, identifier, and the number of variables/samples present. Additionally the user can see if they have access to the study or click to a link where they can learn more about the study and request access to studies they are not yet authorized to use.

    Query annotations for all SNVs and dbSNP INDELS in the Annotation Explorer: Users on Seven Bridges can now use the Annotation Explorer to interactively aggregate and filter all SNVs (over 8 billion variants) and publicly available INDELs from dbSNP using ~700 annotations. Variant grouping files can be created from the results and exported to a workspace for use in rare variant association testing. This database is available to all authenticated users of BioData Catalyst. for more information about how to use the Annotation Explorer.

    hashtag
    New user support materials and documentation

    were published on Dockstore to help users developing containers and descriptor files for their bioinformatics pipelines.

    The has been revamped.

    Created and published all materials from the Fellows onboarding session including a recording of the session, all materials used, instructions, etc. Additional webinars will be posted in the coming weeks.

    Launched a 3-part video tutorial series on workflows, which helps users, particularly Fellows that are new to the platform, gain more insight into how to best utilize workflows for their data analysis.

    • - How to run a pre-configured workflow

    • - How to configure and run a workflow from scratch

    • - How to run downstream analysis (on the data that resulted from your workflow)

    Published a clearly walking researchers through how they can leverage free cloud credits from Google Cloud in Terra. Published a related to the aforementioned one on free cloud credits through GCP. This post covers additional funding sources for covering researchers’ cloud costs, highlighting Google EDU providing up to $10,000 in coupons for supported research projects and the NIH STRIDES initiative. Further, Terra added a covering how the call caching feature in Cromwell can help users save time and money.

    Started a new blog post series focused on highlighting papers that may be of interest to the BDCatalyst community. covers a about workflow systems from C. Titus Brown’s lab at UC Davis.

    Published a officially announcing that RStudio is available in Terra, and this includes a for getting up and running.

    Uploaded a demonstrating the use of Terra for viral genomics by guiding the user through the COVID-19 workspace.

    Published a introducing a new feature for task-level checkpointing in workflows. This makes it possible to save intermediate outputs for a task and resume work from that point if the task gets interrupted. Full documentation of this checkpoint feature can be found .

    Uploaded a proposing a cross-domain, common data model built specifically to facilitate search and reuse.

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2021-04-02 data release. CRAMs and unharmonized clinical files were updated for 6 TOPMed studies previously hosted on BioData Catalyst. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The data is now available for access across the entire ecosystem.

    hashtag
    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    Atherosclerosis Risk in Communities (ARIC) Cohort

    phs000280

    ARIC

    False

    7

    Genes-Environments and Admixture in Latino Asthmatics (GALA II) Study

    phs001180

    GALAII

    False

    2

    Cardiovascular Health Study (CHS) Cohort

    phs000287

    CHS

    False

    7

    Women's Health Initiative Clinical Trial and Observational Study

    phs000200

    WHI

    False

    12

    PIC-SURE release notes
  • Dockstore release notesarrow-up-right

  • Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    Framingham Cohort

    phs000007

    FHS

    False

    30

    Genetic Epidemiology Network of Salt Sensitivity (GenSalt)

    phs000784

    GenSalt

    False

    Data pagearrow-up-right
    Documentation Searcharrow-up-right
    Seven Bridges Public Apps Galleryarrow-up-right
    Heterozygosity by samplearrow-up-right
    Pedigree Checkarrow-up-right
    See herearrow-up-right
    Best Practices for Secure and FAIR workflowsarrow-up-right
    Gitbook guide to self-service onboarding to Terraarrow-up-right
    Webinar 1arrow-up-right
    Part 1arrow-up-right
    Part 2arrow-up-right
    Part 3arrow-up-right
    blog postarrow-up-right
    blog postarrow-up-right
    new support documentation articlearrow-up-right
    This first postarrow-up-right
    review paperarrow-up-right
    blog postarrow-up-right
    new video tutorialarrow-up-right
    new video tutorialarrow-up-right
    blog postarrow-up-right
    herearrow-up-right
    video on Broad’s BioIT 2020 Talkarrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right

    3

    2020-04-02 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2020-04-02 release marks the first significant release for the NHLBI BioData Catalyst ecosystem. This release offers an integrated system of platforms and servicesarrow-up-right for researchers to search metadata of hosted datasets, find data files, and analyze data files in workspace environments which support a variety of different analysis modalities.

    The hosted data for this release includes TOPMed multi-sample VCF data for ~55,000 sequenced participants within 32 TOPMed studies included in Freeze 5b as well as CRAM files for those participants. In addition, this release includes raw phenotype filesarrow-up-right for participants in TOPMed studies, providing clinical information such as BMI and lipids levels. In some cases, these data are in different dbGaP accessions than the genomic data. The hosted data is stored in both Amazon Web Services and Google Cloud and users have the option to run computation on either cloud provider. To access the hosted TOPMed data on BioData Catalyst, users must have dbGaP approval. Please refer to the on the BioData Catalyst website for more information.

    For more in depth information please see the "List of significant new features" below.

    hashtag
    List of significant new features

    The following features in this release support primarily TOPMed researchers ranging in technical skills (both command-line and GUI) and with approval for the controlled TOPMed studies in dbGaP:

    • System login and data access: Researchers can log into the BioData Catalyst platforms using their eRA Commons ID. Approvals for TOPMed studies in dbGaP are recognized by the platforms.

    • Search TOPMed phenotypic data: Create cohorts on PIC-SURE by searching and selecting phenotypic variables of interest from dbGaP and then export cohorts to Seven Bridges or Terra for use in analysis workspaces. Users can also explore the TOPMed phenotype variables harmonized by the TOPMed Data Coordinating Center.

    • Find and access TOPMed genomics files, raw phenotype data files, and reference data files:

    Data Releases

    Information on the status of data releases is forthcoming.

    hashtag
    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    Release Notes

    Use the Explorer feature on Gen3 and the Data Browser feature on Seven Bridges.
  • Bring your own data: Use one of several options to upload/import data files to the workspace environments.

  • Run analyses at Scale: Analyze thousands of samples at once using batch processing capabilities in secure workspaces. Ability to run computation on Google Cloud and Amazon Web Services. Utilize visual user interface, Jupyterlab Notebooks and Jupyter Notebooks, RStudio, API, and command line.

  • Association studies: Execute single variant and multiple variant association studies utilizing the GENESIS pipelines, Hail, and others. Utilize Annotation Explorer to create variant grouping files for multiple variant association studies.

  • Collaborate with other users: Share workspaces, files, and tools with other BioData Catalyst users.

  • Documentation: Access documentation for each of the platforms.

  • Track cloud costs: Track cloud storage and compute costs on Seven Bridges and Terra.

  • phs001143

    NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation Study

    CCAF

    phs001189

    NHLBI TOPMed: The Cleveland Family Study

    CFS

    phs000954

    NHLBI TOPMed: Cardiovascular Health Study

    CHS

    phs001368

    NHLBI TOPMed: Genetic Epidemiology of COPD

    COPDGene

    phs000951

    NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica

    CRA

    phs000988

    NHLBI TOPMed: Diabetes Heart Study

    DHS

    phs001412

    NHLBI TOPMed: Boston Early-Onset COPD Study

    EOCOPD

    phs000946

    NHLBI TOPMed: Framingham Heart Study

    FHS

    phs000974

    NHLBI TOPMed: Genes-Environments and Admixture in Latino Asthmatics

    GALAII

    phs000920

    NHLBI TOPMed: Genetic Study of Atherosclerosis Risk

    GeneSTAR

    phs001218

    NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy

    GENOA

    phs001345

    NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity

    GenSalt

    phs001217

    NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate

    GOLDN

    phs001359

    NHLBI TOPMed: Heart and Vascular Health Study

    HVH

    phs000993

    NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy

    HyperGEN

    phs001293

    NHLBI TOPMed: The Jackson Heart Study

    JHS

    phs000964

    NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism

    Mayo_VTE

    phs001402

    NHLBI TOPMed: The Multi-Ethnic Study of Atherosclerosis

    MESA

    phs001416

    NHLBI TOPMed: Massachusetts General Hospital (MGH) Atrial Fibrillation Study

    MGH_AF

    phs001062

    NHLBI TOPMed: Partners HealthCare Biobank

    Partners

    phs001024

    NHLBI TOPMed: San Antonio Family Heart Study

    SAFS

    phs001215

    NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment

    SAGE

    phs000921

    NHLBI TOPMed: African American Sarcoidosis Genetics Resource

    Sarcoidosis

    phs001207

    NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans

    SAS

    phs000972

    NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese

    THRV

    phs001387

    NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry

    VAFAR

    phs000997

    NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry

    VU_AF

    phs001032

    NHLBI TOPMed: The Women's Genome Health Study

    WGHS

    phs001040

    NHLBI TOPMed: Women's Health Initiative

    WHI

    phs001237

    phs000284

    Cardiovascular Health Study

    CHS

    phs000287

    Genetic Epidemiology of COPD

    COPDGene

    phs000179

    Framingham Heart Study

    FHS

    phs000007

    Genes-Environments and Admixture in Latino Asthmatics

    GALAII

    phs001180

    Genetic Study of Atherosclerosis Risk

    GENESTAR

    phs001074

    Genetic Epidemiology Network of Arteriopathy

    GENOA

    phs001238

    Genetic Epidemiology Network of Salt Sensitivity

    GENSALT

    phs000784

    Heart and Vascular Health Study

    HVH

    phs001013

    The Jackson Heart Study

    JHS

    phs000286

    The Multi-Ethnic Study of Atherosclerosis

    MESA

    phs000209

    Massachusetts General Hospital (MGH) Atrial Fibrillation Study

    MGH_AF

    phs001001

    Women's Health Initiative

    WHI

    phs000200

    PIC-SURE release notes
  • Dockstore release notesarrow-up-right

  • Hosted TOPMed study accessions with genomic data from Freeze 5b

    Study Name

    Acronym

    phs I.D. #

    NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish

    Amish

    phs000956

    NHLBI TOPMed: Atherosclerosis Risk in Communities

    ARIC

    phs001211

    NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados

    Hosted TOPMed study accessions with phenotype data

    Study Name

    Acronym

    phs I.D. #

    Atherosclerosis Risk in Communities

    ARIC

    phs000280

    Cleveland Clinic Atrial Fibrillation Study

    CCAF

    phs000820

    The Cleveland Family Study

    Data pagearrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right

    BAGS

    CFS

    2025-04-15 BDC Release Notes

    hashtag
    Introduction

    The 2025-04-15 release marks the 21st release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., cloud cost estimator, PFB’s w/reference to raw files, and updates to cost monitoring workflows) along with documentation and tutorials (e.g., Publicly available coding examples for PIC-SURE on Terra) to help new users get started on the system. Please find more detail on the new features and user support materials in the sections below.

    The 2025-04-15 data releases include the addition of Genes-Environments, Genomic Activities, Atherosclerosis Risk in Communities, Pulmonary Fibrosis, Molecular Atlas of Lung Development, Researching COVID to Enhance Recovery (RECOVER), Studies of Left Ventricular Dysfunction, Acute Respiratory Distress, COVID-19, Resuscitation Outcomes, Sedation Titration for Respiratory Failure, Study of Congestive Heart Failure, Prevention and Early Treatment of Acute Lung Injury. Please refer to the Data Releases section below for more information as well as the

    2025-07-15 BDC Release Notes

    2025-07-15 BDC Ecosystem Release Notes

    Introduction

    The 2025-07-15 release marks the 22nd update to the BDC ecosystem. This release introduces several new features, including CellTypist, an upgraded GATK Variant, new training videos, support for PFB handoffs, and enhancements that enable workflows to run on the GCP Batch API. Additional details on these features and related user support materials are provided below.

    The 2025-07-15 data releases include the addition of 63 new datasets. See the Data Releases section below for more information.

    hashtag

    on the BDC website.

    hashtag
    Significant new features

    Velsera added cloud cost estimators for two additional workflows, samtools mpileup (tool used on BAM files in variant calling workflows), and samtools index (tool used to index BAM, CRAM or BGZIP-compressed SAM files). These calculators enable users to better estimate their cloud costs before running the analysis and incurring the charges. They bring the total number of tools and workflows with cost calculators to 10.

    Gen3 team extended the data ingestion process to include the creation and persistence of PFBs containing references to each dataset's raw files. Gen3 Discovery Page was also extended to show these PFBs and allow them to be handed off to an analysis platform. This feature helps end users who want quick, easy access to the full dataset's original information by providing an interface in the existing Gen3 Discovery Page to see the PFBs. Data can be made available quickly without waiting for harmonized ingestion. Improves data provenance by keeping and making available the original, raw data.

    Gen3 team created the ability for a user to obtain full, harmonized, study-level data as PFBs from the Discovery Page without having to generate them. This eliminates the time consuming, dynamic generation of cohorts when the user simply wants the whole dataset. The handoff of the ready available whole-study PFB can be easily done from the Discovery page to the analysis platform.

    Terra has made several improvements for cost monitoring workflows. Terra now has the ability to show cost reporting in real timearrow-up-right for workflows submitted after 3/3/25 and users can now test setting workflow cost thresholdsarrow-up-right to avoid accidentally cost overruns.

    hashtag
    New user support materials and documentation

    Publicly available coding examples for PIC-SURE on Terra: The publicly available coding examples to get started exploring data and building participant-level cohorts with the PIC-SURE application programming interface has moved on Terra. These new workspaces contain updated code and documentation. Researchers can access the examples at the following links to Terra: Python notebooksarrow-up-right, R notebooksarrow-up-right, and RStudioarrow-up-right.

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2025-04-15 data release.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)

    phs000920.v6.p4.c2

    topmed-GALAII_DS-LD-IRB-COL

    No

    Yes

    NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study

    phs000974.v6.p5.c1

    topmed-FHS_HMB-IRB-MDS

    No

    Yes

    hashtag
    Planned upcoming Data Releases

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    Coronary Artery Risk Development in Young Adults (CARDIA) BioLINCC

    phs003739.v1.p1.c1

    BioLINCC-BL_CARDIA_HMB-MDS

    Yes

    No

    Coronary Artery Risk Development in Young Adults (CARDIA) BioLINCC

    phs003739.v1.p1.c2

    BioLINCC-BL_CARDIA_HMB-NPU-MDS

    Yes

    No

    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    • Terra release notesarrow-up-right

    • Seven Bridges release notes

    Data pagearrow-up-right
    Significant new features

    Velsera included CellTypist workflows and updated the GATK Variants to BDC Powered by Seven Bridges (BDC-Seven Bridges). CellTypist is an open-source tool developed for automated cell type annotation using single-cell RNA sequencing (scRNA-seq) data. The upgraded GATK Variants extracts specific fields from VCF or GVCF files and converts them into a tab-delimited table format, facilitating downstream analysis, visualization, and integration with spreadsheets and statistical tools. Both tools are available in the Public Apps Gallery to all BDC-Seven Bridges users. Additionally, Velsera has released a training video on using the new external connections panel on their YouTube channel. This feature makes it easier for users to connect to external data repositories like Synapse and CAVATICA, as well as linking to DRS servers to access data for analysis on BDC-Seven Bridges. YouTube Video linked here: BDC - Seven Bridges: Seamlessly Import External Datasets from Cavatica, Synapse & CGCarrow-up-right

    PIC-SURE enabled the handoff of PFBs to BDC-Seven Bridges. After creating a cohort of participants by selecting and filtering variables, investigators can now click the “Export to Seven Bridges” button to bring the participant-level data and associated data dictionary to a BDC-Seven Bridges project for analysis (see image below).

    Terra has released upgradesarrow-up-right so that all workflows will now run on GCP Batch API to address Google’s deprecation of the Life Science API. In addition, Terra has also released two new features to address cost management in the cloud: 1) Users now have the ability to set a cost threshold for a workflowarrow-up-right to stop runaway costs from occurring, and 2) billing project owners now have the ability to enable GCP Quota Adjuster, which automatically raises quotas based on usage. Quota Adjuster is highly recommended for GCP Batch, as it supports most Compute Engine quotas, including Managed Instance Groups ("MIG").

    Data Releases

    The table below highlights which studies were included in the data releases in the months of May, June, and July 2025.

    Study Name
    phs I.D. #
    New to BDC

    BioLINCC-BL_ARIC_HMB-NPU-MDS

    phs003738.v1.p1.c1

    New

    BioLINCC-BL_BEST_COPD_GRU

    phs004022.v1.p1.c1

    New

    BioLINCC-BL_ROC_PRIMED_GRU

    phs003825.v2.p2.c1

    New

    heartfailure-PGRN_Afib_HMB

    phs000439.v1.p1.c1

    Planned upcoming Data Releases For Q3

    Study Name
    phs I.D. #
    New to BDC

    heartfailure-BroadEOMI_DS-CVD

    phs000279.v2.p1.c1

    New

    heartfailure-LungExome_PAH_GRU

    phs000290.v1.p1.c1

    New

    heartfailure-LHS-COPD_GRU

    phs000291.v2.p1.c1

    New

    heartfailure-LHS_DS-HLB

    phs000335.v3.p2.c1

    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    • Terra release notesarrow-up-right

    • Seven Bridges release notes

    • PIC-SURE release notes

    2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2024-10-21 release marks the 19th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., supporting seqr genomics analysis, and exporting selected cohort data in PFB format). Please find more detail on the new features in the sections below.

    The 2024-10-21 data releases include the addition of studies on asthma and sickle cell disease, plus new imaging from cardiovascular and atherosclerosis studies. Updates are highlighted for COPD, atrial fibrillation, and childhood asthma studies, and new additions include liver disease, myocardial genomics, and exRNA studies. The release also introduces the RECOVER-Pediatric project and the REDS-IV-P Epidemiology of COVID-19 study. Please refer to the Data Releases section below for more information as well as the Data pagearrow-up-right on the BDC website.

    hashtag
    Significant new features

    BDC Powered by Terra (BDC-Terra) now supports seqr genomics analysis: seqr provides rich gene and variant-level annotations and powerful filtration tools to perform variant searches within a family or across projects. To get started, check out the tutorials, including a video describing how to load your data in seqr.

    Export selected cohort data in Portable Format for Biomedical Data (PFB): BDC Powered by PIC-SURE (BDC-PIC-SURE) now allows researchers to export selected participant-level data in PFB file format. When using the Select and Package Data tool in Authorized PIC-SURE, simply choose “Package Data as PFB” to export in this file format.

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2024-10-21 data release.

    The latest release features NHLBI TOPMed projects such as the Severe Asthma Research Program (SARP) and Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU). Additionally, it includes new imaging XML schemas from the Cardiovascular Health Study (CHS) and the Multi-Ethnic Study of Atherosclerosis. Updates are also highlighted in the Boston Early-Onset COPD Study, Cleveland Clinic Atrial Fibrillation Study, and the Childhood Asthma Management Program (CAMP). New additions include the Human Liver Cohort and studies on myocardial genomics and exRNA profiles. The release also introduces the RECOVER-Pediatric project and the REDS-IV-P Epidemiology of COVID-19 study.

    The data is now available for access across the entire ecosystem.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    Planned Upcoming Data Releases

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    For detailed platform release notes please consult the following resources:

    BDC Powered by Gen3 release notes

    2025-11-15 BDC Ecosystem Release Notes

    This is the 23rd update to the BDC ecosystem.

    hashtag
    Introduction

    The 2025-11-15 release introduces several new features, including expanded data access, new interoperability tools, enhanced compute capabilities, and new bioinformatics workflows.

    The 2025-11-15 data releases include the addition of 109 new datasets. See the Data Releases section for more information.

    hashtag
    Significant New Features

    The following new features this were released this quarter to improve the researcher experience.

    hashtag
    BDC Powered by Seven Bridges (BDC-Seven Bridges)

    • SlicerJupyter, a new Data Studio environment for python-based interaction with 3D Slicer. The environment comes with several example notebooks explaining how to use the functionality for working with volumetric imaging data (for example, MRI, CT).

    • Expanded Data Access: We enabled full Sequence Read Archive (SRA) access via RAS Passport. This integration unlocks petabytes of controlled SRA data for researchers with valid Data Access Requests (DARs), improving data interoperability and access.

    • New Interoperability Tools: We published the PFB Unwrapper app, a new CWL tool that accepts .avro PFB files from Gen3, PIC-SURE, or AnVIL and creates a DRS manifest for streamlined import into BDC-Seven Bridges projects. This is a significant step in developing the PFB Importer v2 functionality to seamlessly execute the BDC Handoff Standard.

    hashtag
    BDC Powered by Terra (BDC-Terra)

    • New workspaces now use a more optimized format for data tables to increase performance and scalability.

    • Locking a workspace now makes the underlying bucket read-only while the workspace is locked.

    • More information about new features can be found in the .

    hashtag
    Data Releases

    The table below highlights which studies were included in the data releases in the months of August, September, and October 2025.

    Study Name
    phs I.D. #

    hashtag
    Upcoming Data Releases

    The table below highlights studies that are planned for release in November and December.

    Study Name
    phs I.D. #

    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2023-07-11 release marks the fourteenth release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features, e.g., Faceted Search in BDC Powered by Seven Bridges (BDC-Seven Bridges), along with documentation to help new users get started on the ecosystem, e.g., updated WDL documentation in BDC Powered by Terra (BDC-Terra). This release also includes enhanced support for discovering what datasets are available via BDC Powered by Gen3. Please find more detail on the new features and user support materials in the sections below.

    The 2023-07-11 data releases include the addition of various research projects related to COVID-19, lung development, platelet transfusion refractoriness, sickle cell anemia, asthma, pregnancy outcomes, and family health studies. Please refer to the Data Releases section below for information on upcoming data releases. A list of currently available data can be viewed on the of the BDC website.

    hashtag
    Significant new features

    Faceted Search in BDC-Seven Bridges: Version 1 of Faceted Search has been deployed for all users on BDC-Seven Bridges. This feature enables users to query or filter any BDC ingested data in a faceted way to find files and form groups of files by searching characteristics such as authorization status, study accession number, type of data, etc. With the release of v1 Faceted Search, users can now more easily find data that is relevant to their research. Faceted Search is currently available for 10 datasets and will be expanded to all hosted datasets in the following quarter. The Faceted Search feature can be found under the Data drop-down menu.

    BDC-Gen3 Metadata Being Updated to bring data from dbGaP FHIR database: BDC-Gen3’s Discovery Page (and underlying BDC-Gen3 Source of Truth Metadata API) allows unauthenticated users to discover what datasets are available in BDC. Fast Health Interoperability Resources (FHIR) is an Health Level Seven International (HL7) specification for Healthcare Interoperability. Last quarter, BDC-Gen3 worked to consume the new metadata from the dbGaP FHIR Server (as part of the officially defined data ingestion process). This quarter, BDC-Gen3’s Data Ingestion Pipeline has been updated to load FHIR metadata every new data release. The loaded metadata is available to all clients/users through BDC-Gen3’s Metadata API, and loaded metadata is viewable in BDC-Gen3’s Discovery Page.

    New and Improved Genomic Filtering on BDC Powered by PIC-SURE (BDC-PIC-SURE): The Genomic Filtering modal on BDC-PIC-SURE has been updated to more accurately represent the relatedness between the various filtering fields. This includes the revamped “Variant consequence calculated” field, which includes different levels of severity and their associated consequences. Additionally, the “Selected Genomic Filters” section now more explicitly summarizes the filter criteria being applied.

    Edit Queries Built in BDC-PIC-SURE Using the API: Researchers that created a cohort on BDC-PIC-SURE’s user interface can now edit that query’s parameters using Python or R code via the BDC-PIC-SURE API. This provides more flexibility for researchers wanting to refine or change their cohort after export and eliminates the need to return to the user interface.

    hashtag
    New user support materials and documentation

    Updated WDL documentation in BDC-Terra: Based on user feedback, Terra documentation has been expanded and updated to include: A new with a section dedicated to resources created by the WDL community, a new wdl-docs website to host the documentation from the new wdl-docs GitHub repository, updates to all existing WDL syntax documentation to match the WDL 1.0 spec, 17 new articles, 11 cookbook-style documents to teach users about specific use cases and provide example workflows, and 6 best practices documents to help users understand some of the grayer areas of coding in WDL. The documents are now available on the new wdl-docs GitHub repository.

    New Code in “0_Export_from_UI” BDC-PIC-SURE API Examples: The example code has been updated to include new coding examples on how to use the BDC-PIC-SURE API to edit query parameters of a cohort built in the BDC-PIC-SURE user interface. These examples are available in both Python and R in both Jupyter and RStudio.

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2023-07-11 data release. The Q2 data release included various research projects related to COVID-19, lung development, platelet transfusion refractoriness, sickle cell anemia, asthma, pregnancy outcomes, and family health studies. These include two studies from the COVID-19 Therapeutic Interventions and Vaccines initiative (ACTIV4a and ACTIV4c). There is a study on lung development (LungMAP) and another tackling platelet transfusion refractoriness in patients with severe thrombocytopenia using Eculizumab (DIR-Eculizumab). Other studies revolve around the use of hydroxyurea in children with sickle cell anemia (BABYHUG), the genetic epidemiology of asthma in Costa Rica (CRA), nulliparous pregnancy outcomes (nuMoM2b), multicenter study of hydroxyurea (MSH), and the Cleveland Family Study (CFS). The data is now available for access across the entire ecosystem.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    Planned Upcoming Data Releases

    hashtag
    For detailed platform release notes please consult the following resources:

    BDC-Gen3 release notes

    2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2024-01-08 release marks the 16th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., enabling Azure and searching data without logging in) along with documentation and tutorials (e.g., data dictionary field documentation) to help new users get started on the system. Please find more detail on the new features and user support materials in the sections below.

    The 2024-01-08 data releases include the addition of research on multisystem inflammatory syndrome in children linked to COVID-19, bone marrow transplant and pulmonary hypertension in sickle cell disease, atherosclerosis, and psoriasis. Please refer to the Data Releases section below for more information as well as the Data pagearrow-up-right on the BDC website.

    hashtag
    Significant new features

    Azure available on BDC Powered by Seven Bridges (BDC-SB): Velsera expanded their existing multi-cloud offerings by enabling Microsoft Azure (southcentralus) on BDC-SB. Users can select that computing and storage environment when creating a project. This allows users to avoid any egress charges when computing on data stored in Azure. This is of particular interest to users who want to connect their own Azure cloud buckets to BDC-SB.

    SAS upgrade in BDC-SB: SAS on BDC-SB has been upgraded from SAS Viya 3.5 to SAS Studio 9.4. SAS 9.4 has improved functionality over SAS 3.5 including more complete data management solutions and additional programming languages.

    Open PIC-SURE without login: Open PIC-SURE is now publicly available on BDC Powered by PIC-SURE (BDC-PIC-SURE), meaning no eRA Commons credentials are required to access the site. Researchers can access this site to search terms of interest, apply filters at the variable-value level, retrieve obfuscated, aggregate counts, and view single variable distributions of their selected cohort. This new functionality allows researchers to discover and interact with data available on BDC without needing to log in, decreasing the barrier to data exploration. Check out Open PIC-SURE .

    Data Hierarchies in BDC-PIC-SURE: Researchers are now able to view the data hierarchy associated with variables in BDC-PIC-SURE by clicking the “Data Tree” icon in the “Actions” column of the search results. This enables researchers to understand better how variables are related and obtain additional context for these variables. Note that this feature is currently in beta and will only be available for some studies. Feedback and input on this feature is welcome!

    hashtag
    New user support materials and documentation

    BDC-PIC-SURE Data Dictionary fields documentation: Documentation outlining the data dictionary fields returned from the PIC-SURE API was created. This provides a detailed account of what each field represents, including relationships between fields. This documentation can be found in the BDC-PIC-SURE GitBook .

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2024-01-08 data release. The release features research on long-term outcomes of multisystem inflammatory syndrome in children linked to COVID-19 (COVID19-MUSIC_GRU), bone marrow transplant for severe sickle cell disease (BioLINCC-BMT_CTN_HMB), and ApoA-1, atherosclerosis, and psoriasis (DIR-ApoA-1_Atherosclerosis_in_Psoriasis_GRU). Additionally, updated metadata is provided for the ongoing study on sildenafil therapy in treating pulmonary hypertension in sickle cell disease (walk-PHaSST). This data includes clinical files and is now available for access. The data is now available for access across the entire ecosystem.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    Planned Upcoming Data Releases

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    For detailed platform release notes please consult the following resources:

    BDC-Gen3 release notes

    2022-07-11 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2022-07-11 release marks the tenth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., importing files from AnVIL via DRS and creating multi-sample VCFs). This release also includes enhanced support for CWL tools on GitHub. Please find more detail on the new features in the sections below.

    The 2022-07-11 data release includes the addition of COVID-19 dataset C3PO and TOPMed Freeze 9 batch 3 and 4. Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.

    hashtag
    Significant new features

    Import files from AnVIL to BioData Catalyst Powered by Seven Bridges via DRS

    Seven Bridges released an interoperability feature enabling . A TOPMed researcher working in BioData Catalyst who identifies a causal variant through association testing might want to next investigate how that variant affects gene expression. The AnVIL ecosystem hosts the Genotype-Tissue Expression (GTEx) datasets which can be used to understand which tissues are affected by novel variants. Seven Bridges’ latest release allows a TOPMed researcher to go to AnVIL and push data they have permissions for to BioData Catalyst Powered by Seven Bridges, thus allowing the researcher to run the variant association test on TOPMed data and identify how that variant changes tissue expression with GTEx data in one workspace.

    Create multi-sample VCFs with the Variant Store

    Researchers who have access to many TOPMed studies will want to mix and combine VCF files into a multi-sample VCF. Additionally, researchers might want to subset samples based on genomic regions. Using standard bioinformatics tools, this process involves many manual steps and can be time intensive and cost prohibitive. The on BioData Catalyst Powered by Seven Bridges uses a series of API calls to combine VCFs from studies of interest and subset the multi-sample VCF based on the selected genomic region. The latest release allows researchers to track the costs associated with generating multi-sample VCFs via the Variant Store as a dedicated line item in their billing group separate from analysis and storage costs.

    Explore, tag, and annotate phenotypes in the Study Variable Explorer

    The on BioData Catalyst Powered by Seven Bridges allows researchers to explore phenotypic variables from the TOPMed data dictionaries in an open access manner. Previously, researchers were limited to searching data dictionary information on dbGaP and making comparisons between different study variables was cumbersome with poor UX. Study Variable Explorer enables researchers to select phenotypic variables from across TOPMed studies and view detailed information and distributions of the variable data. By searching keywords, such as obesity, a researcher can compare like variables within and across hosted datasets including the number of subjects and descriptions of the variables. Additionally, users can create custom searchable tags and notes for each variable to track their variable selection and pre-harmonization process.

    New CWL Tools and Workflows on BioData Catalyst Powered by Seven Bridges

    • An updated version of the SRA Download and Set Metadata workflow (SRA Toolkit 3.0.0) that downloads metadata associated with SRA accession via SRA Run Info CGI, (on-demand instance) FASTQ files and sets corresponding metadata.

    • fastENLOC (v1.0, CWL1.2), a tool that enables integrative genetic association analysis of molecular QTL data and GWAS data. It performs integration of the results from molecular quantitative trait loci (QTL) mapping into genome-wide genetic association analysis of complex traits, with the primary objective of quantitatively assessing the enrichment of the molecular QTLs in complex trait-associated genetic variants and the colocalizations of the two types of association signals.

    Dockstore GitHub app support expanded to CWL tools

    Researchers can now register your tool to automatically sync with GitHub. Using GitHub Apps, Dockstore can react to changes on GitHub as they are made, keeping Dockstore synced with GitHub automatically. Additional details are available .

    hashtag
    Data Releases

    The table below highlights which studies were included in the Q2 2022 data releases. The data is now available for access across the entire ecosystem.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    Planned Upcoming Data Releases

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    For detailed platform release notes please consult the following resources:

    Gen3 release notes PIC-SURE release notes

    NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study

    phs000974.v6.p5.c2

    topmed-FHS_HMB-IRB-NPU-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)

    phs001211.v5.p4.c1

    topmed-ARIC_HMB-IRB-NPU-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)

    phs001211.v5.p4.c2

    topmed-ARIC_DS-CVD-IRB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: MESA and MESA Family AA-CAC

    phs001416.v4.p1.c1

    topmed-MESA_HMB

    No

    Yes

    NHLBI TOPMed: MESA and MESA Family AA-CAC

    phs001416.v4.p1.c2

    topmed-MESA_HMB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v4.p2.c6

    topmed-IPF_DS-LD-IRB-COL-NPU

    No

    Yes

    LungMAP: Molecular Atlas of Lung Development - Human Lung Tissue

    phs001961.v3.p1.c1

    LungMAP-MALD_GRU

    No

    Yes

    Multi-Ethnic Study of Atherosclerosis (BioLINCC): BL_MESA

    phs003288.v1.p1.c1

    BioLINCC-BL_MESA_HMB

    Yes

    No

    Multi-Ethnic Study of Atherosclerosis (BioLINCC): BL_MESA

    phs003288.v1.p1.c2

    BioLINCC-BL_MESA_HMB-NPU

    Yes

    No

    Researching COVID to Enhance Recovery (RECOVER): RECOVER_Adult_6-24

    phs003463.v2.p2.c1

    RECOVER-RC_Adult_GRU

    No

    No

    Studies of Left Ventricular Dysfunction (SOLVD)

    phs003668.v1.p1.c1

    BioLINCC-BL_SOLVD_GRU

    Yes

    No

    Researching COVID to Enhance Recovery (RECOVER): Autopsy

    phs003768.v1.p1.c1

    RECOVER-RC_Autopsy_GRU

    Yes

    No

    Acute Respiratory Distress Network (ARDSNet) Study 02 Late Steroid Rescue Study (LaSRS)

    phs003769.v1.p1.c1

    BioLINCC-BL_ARDSNet_LASRS_HMB-MDS

    Yes

    No

    Resuscitation Outcomes Consortium (ROC) Hypertonic Saline Trial Shock Study (HS) and Traumatic Brain Injury Study (TBI)

    phs003777.v1.p1.c1

    BioLINCC-BL_ROC_HS_TBI_GRU

    Yes

    No

    Randomized Evaluation of Sedation Titration for Respiratory Failure (RESTORE)

    phs003783.v1.p1.c1

    BioLINCC-BL_RESTORE_GRU

    Yes

    No

    Resuscitation Outcomes Consortium (ROC) Trauma Epidemiologic Registry (Trauma Epistry) (ROC-Trauma Epistry)

    phs003809.v1.p1.c1

    BioLINCC-BL_ROC-Trauma_Epistry_GRU

    Yes

    No

    BioLINCC The Women's Health Initiative (WHI)

    phs003824.v1.p1.c1

    imaging-img_WHI_HMB

    Yes

    No

    BioLINCC The Women's Health Initiative (WHI)

    phs003824.v1.p1.c2

    imaging-img_WHI_HMB-NPU

    Yes

    No

    Evaluation Study of Congestive Heart Failure and Pulmonary Artery Catheterization Effectiveness (ESCAPE)

    phs003782.v1.p1.c1

    BioLINCC-BL_ESCAPE_GRU

    Yes

    No

    Prevention and Early Treatment of Acute Lung Injury Network – Reevaluation of Systemic Early Neuromuscular Blockade (PETAL ROSE)

    phs003878.v1.p1.c1

    BioLINCC-BL_PETAL_ROSE_HMB-MDS

    Yes

    No

    Jackson Heart Study (JHS) BioLINCC

    phs003740.v1.p1.c1

    BioLINCC-BL_JHS_HMB-MDS

    Yes

    No

    Jackson Heart Study (JHS) BioLINCC

    phs003740.v1.p1.c2

    BioLINCC-BL_JHS_HMB-NPU-MDS

    Yes

    No

    Jackson Heart Study (JHS) BioLINCC

    phs003740.v1.p1.c3

    BioLINCC-BL_JHS_DS-CVD-MDS

    Yes

    No

    Jackson Heart Study (JHS) BioLINCC

    phs003740.v1.p1.c4

    BioLINCC-BL_JHS_DS-CVD-NPU-MDS

    Yes

    No

    COVID-19 ACTIV-4 ACUTE: A Multicenter, Adaptive, Randomized Controlled Platform Trial of the Safety and Efficacy of Antithrombotic Strategies in Hospitalized Adults with COVID-19 (ACTIV4A)

    phs002694.v4.p1.c1

    COVID19-ACTIV4A_GRU

    No

    No

    MIGen_ExS: BioImage Study

    phs001058.v1.p1.c1

    heartfailure-MiGEN_EXS_BIOIMAGE_DS-CVD

    Yes

    No

    Rapid Early Action for Coronary Treatment (REACT)

    phs003885.v1.p1.c1

    BioLINCC-BL_REACT_GRU

    Yes

    No

    Digitalis Investigation Group (DIG)

    phs003872.v1.p1.c1

    BioLINCC-BL_DIG_GRU

    Yes

    No

    Prevention and Early Treatment of Acute Lung Injury Network – Vitamin D to Improve Outcomes by Leveraging Early Treatment (PETAL VIOLET)

    phs003879.v1.p1.c1

    BioLINCC-BL_PETAL_VIOLET_HMB-MDS

    Yes

    No

    Resuscitation Outcomes Consortium (ROC) Prehospital Resuscitation on Helicopter Study (PROHS)(ROC-PROHS-BioLINCC)

    phs003826.v1.p1.c1

    BioLINCC-BL_ROC_PROHS_GRU

    Yes

    No

    Sleep Heart Health Study (SHHS)

    phs003637.v1.p1.c1

    BioLINCC-BL_SHHS_NSRR_HMB-MDS

    No

    No

    Best Endovascular vs. Best Surgical Therapy in Patients With Critical Limb Ischemia

    phs003844.v1.p1.c1

    BioLINCC-BL_BEST_CLI_GRU

    Yes

    No

    LungMAP: Molecular Atlas of Lung Development

    phs001961.v3.p1.c1

    LungMAP-MALD_GRU

    No

    No

    ARDSnet and the iSPAAR Consortium: Genetic Studies

    phs000631.v1.p1.c1

    heartfailure-ARDSnet_gen_HMB

    Yes

    No

    NHLBI TOPMed - NHGRI CCDG: Pakistan Risk of Myocardial Infarction Study (PROMIS)

    phs001569.v1.p1.c1

    topmed-PROMIS_GRU

    Yes

    No

    NHLBI TOPMed - NHGRI CCDG: Groningen Genetics of Atrial Fibrillation (GGAF) Study

    phs001725.v3.p1.c1

    topmed-GGAF_GRU

    No

    Yes

    Researching COVID to Enhance Recovery (RECOVER): Pediatrics

    phs003461.v2.p2.c1

    RECOVER-RC_Pediatrics_GRU

    No

    Yes

    Trial of Late Surfactant for Prevention of Bronchopulmonary Dysplasia: A Study in Ventilated Preterm Infants Receiving Inhaled Nitric Oxide

    phs003899.v1.p1.c1

    BioLINCC-TOLSURF_GRU

    Yes

    No

    Prevention and Early Treatment of Acute Lung Injury (PETAL) Acetaminophen in Sepsis: Targeted Therapy to Enhance Recovery

    phs003900.v1.p1.c1

    BioLINCC-PETAL_ASTER_HMB-MDS

    Yes

    No

    Jackson Heart Study (JHS)

    phs003747.v1.p1.c1

    imaging-img_JHS_HMB-MDS

    Yes

    No

    Jackson Heart Study (JHS)

    phs003747.v1.p1.c2

    imaging-img_JHS_HMB-NPU-MDS

    Yes

    No

    Jackson Heart Study (JHS)

    phs003747.v1.p1.c3

    imaging-img_JHS_DS-CVD-MDS

    Yes

    No

    Jackson Heart Study (JHS)

    phs003747.v1.p1.c4

    imaging-img_JHS_DS-CVD-NPU-MDS

    Yes

    No

    Heart Failure Network: Inorganic Nitrite Delivery to Improve Exercise Capacity in HFpEF (INDIE)

    phs003804.v1.p1.c1

    imaging-img_HFN_INDIE_GRU-IRB

    Yes

    No

    Public Access Defibrillation Community Trial (PAD)

    phs003858.v1.p1.c1

    BioLINCC-BL_PAD_GRU

    Yes

    No

    Resuscitation Outcomes Consortium Trial Of Continuous Compressions Versus Standard CPR In Patients With Out-Of-Hospital Cardiac Arrest

    phs003901.v1.p1.c1

    BioLINCC-BL_ROC_CCC_GRU

    Yes

    No

    Consortium Pragmatic Trial of Airway Management in Out-of-Hospital Cardiac Arrest

    phs003902.v1.p1.c1

    BioLINCC-BL_ROC_PART_GRU

    Yes

    No

    Lung Tissue Research Consortium

    phs003913.v1.p1.c1

    BioLINCC-BL_LTRC_DS-LD-MDS

    Yes

    No

    SNPs and Extent of Atherosclerosis (SEA) Study

    phs000349.v1.p1.c1

    heartfailure-SEA_GRU

    Yes

    No

    Researching COVID to Enhance Recovery (RECOVER): Adult

    phs003463.v4.p3.c1

    RECOVER-RC_Adult_GRU

    No

    Yes

    Genetic Analysis of Limb Malformation Disorders: Miller Syndrome Sequencing Study (LMD-MS)

    phs000244.v1.p1.c1

    heartfailure-LMD-MS_GRU

    Yes

    No

    Genetic Analysis of Limb Malformation Disorders: Freeman Sheldon Syndrome Exome Sequencing Study (LMD-FSS)

    phs000204.v1.p1

    heartfailure-LMD-FSS

    Yes

    No

    Proteomic biomarkers of progressive fibrosing interstitial lung disease: a multicentre cohort analysis (PF-ILD Proteomics)

    phs003954.v1.p1

    BioLINCC-BL_PF_ILD

    Yes

    No

    BLUE CORAL: Biology and Longitudinal Epidemiology of PETAL COVID-19 Observational Study Biology and Longitudinal Epidemiology of PETAL COVID-19 Observational Study (BLUE CORAL)

    phs003419.v2.p1

    COVID19-BLUE_CORAL

    No

    Yes

    PIC-SURE release notesarrow-up-right

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c2

    topmed-pharmHU_DS-SCD-RD

    Yes

    No

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c3

    topmed-pharmHU_DS-SCD

    Yes

    No

    Cardiovascular Health Study (CHS) - Imaging

    phs003639.v1.p1.c1

    imaging-img_CHS_HMB-MDS

    Yes

    No

    Cardiovascular Health Study (CHS) - Imaging

    phs003639.v1.p1.c2

    imaging-img_CHS_HMB-NPU-MDS

    Yes

    No

    Cardiovascular Health Study (CHS) - Imaging

    phs003639.v1.p1.c3

    imaging-img_CHS_DS-CVD-MDS

    Yes

    No

    Cardiovascular Health Study (CHS) - Imaging

    phs003639.v1.p1.c4

    imaging-img_CHS_DS-CVD-NPU-MDS

    Yes

    No

    Multi-Ethnic Study of Atherosclerosis (Electrocardiogram Tracing Repository)

    phs003703.v1.p1.c1

    imaging-img_MESA_ECG_HMB

    Yes

    No

    Multi-Ethnic Study of Atherosclerosis (Electrocardiogram Tracing Repository)

    phs003703.v1.p1.c2

    imaging-img_MESA_ECG_HMB-NPU

    Yes

    No

    Sleep Heart Health Study (SHHS-BioLINCC)

    phs003637.v1.p1.c1

    BioLINCC-BL_SHHS_HMB-MDS

    No

    No

    NHLBI TOPMed: Boston Early-Onset COPD Study

    phs000946.v6.p2.c1

    topmed-EOCOPD_DS-CS-RD

    No

    Yes

    NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation (CCAF) Study

    phs001189.v5.p1.c1

    topmed-CCAF_AF_GRU-IRB

    No

    Yes

    NHLBI TOPMed: NHGRI CCDG: AF Biobank LMU in the context of the MED Biobank LMU

    phs001543.v3.p1.c1

    topmed-AFLMU_HMB-IRB-PUB-COL-NPU-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: The GENetics in Atrial Fibrillation (GENAF) Study

    phs001547.v3.p1.c1

    topmed-GENAF_HMB-NPU

    No

    Yes

    NHLBI TOPMed: Early-Onset Atrial Fibrillation in the Estonian Biobank

    phs001606.v3.p1.c1

    topmed-EGCUT_GRU

    No

    Yes

    NHLBI TOPMed: NHGRI CCDG: The BioMe Biobank at Mount Sinai

    phs001644.v3.p2.c1

    topmed-BioMe_HMB-NPU

    No

    Yes

    NHLBI TOPMed: Childhood Asthma Management Program (CAMP)

    phs001726.v3.p1.c1

    topmed-CAMP_DS-AST-COPD

    No

    Yes

    Human Liver Cohort (HLC)

    phs000253.v1.p1.c1

    heartfailure-HLC_GRU

    Yes

    No

    NHLBI Exome Sequencing in SCID

    phs000479.v1.p1.c1

    heartfailure-Exome_SCID_GRU

    Yes

    No

    Familial Exome Sequencing in Rare Pediatric Phenotypes

    phs000553.v1.p1.c1

    heartfailure-FamExome_RarePeds_GRU-MDS

    Yes

    No

    PCGC: Congenital Heart Disease Genetic Network Study

    phs000571.v6.p2.c2

    PCGC-CHD-GENES_DS-CHD

    Yes

    No

    NHLBI GO-ESP: Family Studies (Mendelian Lipid Disorders)

    phs000587.v1.p1.c1

    heartfailure-Fam_MLD_DS-CLA

    Yes

    No

    NextGen Consortium: iPS Derived Hepatocytes Study (PhLiPS Study)

    phs001341.v1.p1.c1

    heartfailure-PhLiPS_GRU

    Yes

    No

    Myocardial Applied Genomics Network (MAGNet) Study

    phs001539.v4.p1.c1

    heartfailure-MAGNet_HMB-MDS

    Yes

    No

    Cardiovascular ATVB: Atherosclerosis Thrombosis and Vascular Biology

    phs001592.v1.p1.c1

    heartfailure-CardioATVB_DS-CVD

    Yes

    No

    Profiles of exRNA in CSF and Plasma from Subarachnoid Hemorrhage Patients

    phs001759.v1.p1.c1

    heartfailure-exRNA_CSF_HMB

    Yes

    No

    miRNA Profiling of Maternal and Non-Maternal Healthy Adult Blood Plasma Using Small RNA-Sequencing

    phs001892.v1.p1.c1

    heartfailure-miRNA_Maternal_Plasma_GRU

    Yes

    No

    NHLBI TOPMed: NHGRI CCDG: UCSF Atrial Fibrillation Study

    phs001933.v2.p1.c1

    topmed-UCSF_Afib_HMB-MDS

    Yes

    No

    NIH RECOVER-Pediatric: Understanding the Long-Term Impact of COVID on Children and Families

    phs003461.v1.p1.c1

    RECOVER-RC_Pediatrics_GRU

    Yes

    No

    REDS-IV-P Epidemiology, Surveillance and Preparedness of the Novel SARS-CoV-2 Epidemic (RESPONSE)

    phs003578.v1.p1.c1

    REDS-RESPONSE_GRU

    Yes

    No

    Sudden Cardiac Death in Heart Failure Trial (SCD-HeFT-BioLINCC)

    phs003654.v1.p1.c1

    BioLINCC-BL_SCD-HeFT_GRU

    Yes

    No

    NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)

    phs001472.v3.p2.c1

    topmed-ECLIPSE_DS-CS-MDS-RD

    No

    Yes

    NHLBI TOPMed: Characterizing the Response to a Leukotriene Receptor Antagonist and an Inhaled Corticosteroid (CLIC)

    phs001729.v3.p1.c2

    topmed-CARE_CLIC_DS-ASTHMA-IRB-COL

    No

    No

    Molecular Genetics of Heterotaxy and Related Congenital Heart Defects

    phs001814.v1.p1.c1

    heartfailure-MolGen_CHD_GRU

    Yes

    No

    NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)

    phs001402.v3.p1.c1

    topmed-Mayo_VTE_GRU

    No

    Yes

    NHLBI TOPMed: My Life Our Future (MLOF) Research Repository of Patients with Hemophilia A (Factor VIII Deficiency) or Hemophilia B (Factor IX Deficiency)

    phs001515.v2.p2.c1

    topmed-MLOF_HMB-PUB

    No

    Yes

    NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

    phs001368.v4.p2.c1

    topmed-CHS_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

    phs001368.v4.p2.c2

    topmed-CHS_HMB-MDS

    No

    Yes

    NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

    phs001368.v4.p2.c3

    topmed-CHS_DS-CVD-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

    phs001368.v4.p2.c4

    topmed-CHS_DS-CVD-MDS

    No

    Yes

    NHLBI TOPMed: San Antonio Family Heart Study (SAFHS)

    phs001215.v4.p2.c1

    topmed-SAFHS_DS-DHD-IRB-PUB-MDS-RD

    No

    Yes

    NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica

    phs000988.v6.p1.c1

    topmed-CRA_DS-ASTHMA-IRB-MDS-RD

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c1

    topmed-IPF_DS-ILD-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c2

    topmed-IPF_DS-LD-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c3

    topmed-IPF_DS-PFIB-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c4

    topmed-IPF_DS-PUL-ILD-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c5

    topmed-IPF_HMB-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c6

    topmed-IPF_DS-LD-IRB-COL-NPU

    No

    Yes

    NIH RECOVER: A Multi-Site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults

    phs003463.v2.p2.c1

    RECOVER-RC_Adult_GRU

    No

    Yes

    NHLBI TOPMed: Women's Health Initiative (WHI)

    phs001237.v3.p1.c1

    topmed-WHI_HMB-IRB

    No

    Yes

    NHLBI TOPMed: Women's Health Initiative (WHI)

    phs001237.v3.p1.c2

    topmed-WHI_HMB-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study

    phs000974.v5.p4.c1

    topmed-FHS_HMB-IRB-MDS

    No

    Yes

    NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study

    phs000974.v5.p4.c2

    topmed-FHS_HMB-IRB-NPU-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)

    phs001211.v4.p3.c1

    topmed-ARIC_HMB-IRB

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)

    phs001211.v4.p3.c2

    topmed-ARIC_DS-CVD-IRB

    No

    Yes

    NHLBI TOPMed: MESA and MESA Family AA-CAC

    phs001416.v3.p1.c1

    topmed-MESA_HMB

    No

    Yes

    NHLBI TOPMed: MESA and MESA Family AA-CAC

    phs001416.v3.p1.c2

    topmed-MESA_HMB-NPU

    No

    Yes

    NHLBI TOPMed: Pediatric Cardiac Genomics Consortium (PCGC)'s Congenital Heart Disease Biobank

    phs001735.v2.p1.c1

    topmed-PCGC_HMB

    No

    Yes

    NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment (SAGE)

    phs000921.v5.p2.c2

    topmed-SAGE_DS-LD-IRB-COL

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)

    phs000920.v6.p4.c2

    topmed-GALAII_DS-LD-IRB-COL

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c1

    topmed-IPF_HMB-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c2

    topmed-IPF_DS-LD-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c3

    topmed-IPF_DS-ILD-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c4

    topmed-IPF_DS-PFIB-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607.v3.p2.c5

    topmed-IPF_DS-PUL-ILD-IRB-NPU

    No

    Yes

    The Collaborative Cohort of Cohorts for COVID-19 Research (C4R)

    phs003045.v1.p1.c1

    COVID19-C4R_CARDIA_HMB

    Yes

    No

    The Collaborative Cohort of Cohorts for COVID-19 Research (C4R)

    phs003045.v1.p1.c2

    COVID19-C4R_CARDIA_HMB-NPU

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

    phs002909.v1.p1.c1

    COVID19-C4R_SPIROMICS_GRU

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

    phs002909.v1.p1.c2

    COVID19-C4R_SPIROMICS_GRU_NPU

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

    phs002909.v1.p1.c3

    COVID19-C4R_SPIROMICS_COPD

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

    phs002909.v1.p1.c4

    COVID19-C4R_SPIROMICS_COPD_NPU

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

    phs002909.v1.p1.c5

    COVID19-C4R_SPIROMICS_GRU_COL

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

    phs002909.v1.p1.c6

    COVID19-C4R_SPIROMICS_GRU-NPU-COL

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

    phs002909.v1.p1.c7

    COVID19-C4R_SPIROMICS_COPD-COL

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)

    phs002909.v1.p1.c8

    COVID19-C4R_SPIROMICS_COPD-NPU-COL

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)

    phs003017.v1.p1.c1

    COVID19-C4R_MESA_HMB

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)

    phs003017.v1.p1.c2

    COVID19-C4R_MESA_HMB-NPU

    Yes

    No

    NHLBI TOPMed: Severe Asthma Research Program (SARP)

    phs001446.v3.p2.c1

    topmed-SARP_GRU

    Yes

    No

    NHLBI TOPMed: Severe Asthma Research Program (SARP)

    phs001446.v3.p2.c2

    topmed-SARP_DS-AAI-PUB

    Yes

    No

    TRanscriptomic ANalySis of left ventriCulaR gene Expression (TRANSCRibE)

    phs001679.v1.p1.c1

    heartfailure-TRANSCRibE_GRU

    Yes

    No

    TRanscriptomic ANalySis of left ventriCulaR gene Expression (TRANSCRibE)

    phs001679.v1.p1.c2

    heartfailure-TRANSCRibE_DS-CI

    Yes

    No

    videoarrow-up-right
    BDC Powered by Terra release notesarrow-up-right
    BDC Powered by Seven Bridges release notesarrow-up-right
    BDC Powered by PIC-SURE release notesarrow-up-right

    New

    imaging-img_ARIC_HMB-IRB-NPU-MDS

    phs003946.v1.p3.c1

    New

    Individual_Study-PATH_HHT_DS-HHT-IRB-PUB-COL

    phs003948.v1.p1.c1

    New

    BioLINCC-BL_LRC_PS_GRU

    phs003995.v1.p1.c1

    New

    BioLINCC-BL_SHHS_NSRR_HMB-MDS

    phs003637.v1.p1.c1

    Update (Other)

    COVID19-ACTIV4A_GRU

    phs002694.v4.p1.c1

    Update (Other)

    RECOVER-RC_Autopsy_GRU

    phs003768.v2.p2.c1

    Update (Version)

    parent-FHS_HMB-IRB-MDS_

    phs000007.v34.p15.c1

    Update (Version)

    parent-FHS_HMB-IRB-NPU-MDS_

    phs000007.v34.p15.c2

    Update (Version)

    heartfailure-STAMPEED_MIGen_GRU

    phs000294.v1.p1.c1

    New

    parent-PCGC_HMB_

    phs001194.v4.p3.c1

    Update (Version)

    parent-PCGC_DS-CHD_

    phs001194.v4.p3.c2

    Update (Version)

    topmed-PUSH_SCD_DS-SCD-IRB-PUB-COL

    phs001682.v3.p1.c1

    Update (Version)

    BioLINCC-BL_ROC_CPR_GRU

    phs003818.v1.p1.c1

    New

    BioLINCC-BL_PRHHP_GRU

    phs003930.v1.p1.c1

    New

    imaging-img_HCSC-SOL_HMB-NPU

    phs003963.v1.p1.c1

    New

    imaging-img_HCSC-SOL_HMB

    phs003963.v1.p1.c2

    New

    BioLINCC-BL_IPPB_GRU

    phs004010.v1.p1.c1

    New

    Individual_Study-INVESTED_GRU

    phs004011.v1.p1.c1

    New

    BioLINCC-BL_LHS_GRU

    phs004013.v1.p1.c1

    New

    DIR-Stressors_and_Health_Study_HMB-PUB-COL

    phs004019.v1.p1.c1

    New

    BioLINCC-BL_PIOPED_GRU

    phs004020.v1.p1.c1

    New

    BioLINCC-BL_ALLHAT_GRU

    phs004021.v1.p1.c1

    New

    BioLINCC-BL_CONCERT_HF_GRU

    phs004055.v1.p1.c1

    New

    parent-ARIC_HMB-IRB_

    phs000280.v8.p2.c1

    New

    parent-ARIC_DS-CVD-IRB_

    phs000280.v8.p2.c2

    New

    parent-CARDIA_HMB-IRB_

    phs000285.v3.p2.c1

    New

    parent-CARDIA_HMB-IRB-NPU_

    phs000285.v3.p2.c2

    New

    parent-JHS_HMB-IRB-NPU_

    phs000286.v7.p2.c1

    New

    parent-JHS_DS-FDO-IRB-NPU_

    phs000286.v7.p2.c2

    New

    parent-JHS_HMB-IRB_

    phs000286.v7.p2.c3

    New

    parent-JHS_DS-FDO-IRB_

    phs000286.v7.p2.c4

    New

    heartfailure-Fam_PAH_GRU

    phs000354.v1.p1.c1

    New

    heartfailure-Fam_FAF_HMB

    phs000362.v1.p1.c1

    New

    heartfailure-GENOA_GRU

    phs000379.v1.p1.c1

    New

    heartfailure-PGRN_ACE_HMB

    phs000438.v1.p1.c1

    New

    heartfailure-DrugRes_HTN_GRU

    phs000442.v1.p1.c1

    New

    heartfailure-CAP_GRU

    phs000481.v3.p2.c1

    New

    heartfailure-BEN_HMB

    phs000507.v2.p2.c1

    New

    heartfailure-Fam_IB_GRU

    phs000518.v1.p1.c1

    New

    heartfailure-Fam_DC_DS-FDC

    phs000581.v1.p1.c1

    New

    heartfailure-KCNE1_TDP_HMB

    phs000617.v1.p1.c1

    New

    heartfailure-Fam_CHD_HMB

    phs000758.v1.p1.c1

    New

    heartfailure-MiGen_EXS_Ottawa_GRU

    phs000806.v1.p1.c1

    New

    heartfailure-PGRN_DILQTS_GRU

    phs000808.v1.p1.c1

    New

    heartfailure-MiGen_EXS_ItalAmer_GRU

    phs000814.v1.p1.c1

    New

    heartfailure-Exome_Thrombo-Leuk_GRU

    phs000873.v1.p1.c1

    New

    heartfailure-Twins_Asthma_GRU

    phs000886.v1.p1.c1

    New

    heartfailure-MiGen_EXS_REGICOR_DS-CVD

    phs000902.v1.p1.c1

    New

    heartfailure-MiGEN_EXS_PROMIS_GRU

    phs000917.v1.p1.c1

    New

    heartfailure-BroadEOMI_exome_GRU

    phs000936.v1.p1.c1

    New

    heartfailure-BroadEOMI_exome_DS-MI

    phs000936.v1.p1.c2

    New

    heartfailure-BroadEOMI_exome_DS-CVD

    phs000936.v1.p1.c3

    New

    heartfailure-PGRN_Cardio-Stat_HMB

    phs000963.v1.p1.c1

    New

    heartfailure-MiGEN_EX_UL_DS-CVD

    phs000990.v1.p1.c1

    New

    heartfailure-Endothelial_PAH_GRU

    phs000998.v2.p1.c1

    New

    heartfailure-MiGEN_EXS_MDC_HMB-MDS

    phs001101.v1.p1.c1

    New

    heartfailure-Globin_iPS_GRU

    phs001212.v1.p1.c1

    New

    heartfailure-exRNA_healthy_HMB

    phs001258.v2.p1.c1

    New

    topmed-CHIRAH_DS-ASTHMA-IRB-COL

    phs001605.v3.p1.c2

    Update (Version)

    topmed-GCPD-A_DS-ASTHMA-GSO

    phs001661.v4.p1.c1

    Update (Version)

    topmed-sumstats_GRU

    phs001974.v8.p1.c1

    New

    heartfailure-REGARDS_GWAS_HMB-IRB

    phs002719.v1.p1.c1

    New

    COVID19-C4R_CARDIA_HMB-IRB

    phs003045.v2.p2.c1

    New

    COVID19-C4R_CARDIA_HMB-IRB-NPU

    phs003045.v2.p2.c2

    New

    Individual_Study-PETAL_ROSE_ARDS_RNASeq_HMB

    phs003929.v1.p1.c1

    New

    imaging-img_dMRI_VGC_GRU

    phs004002.v1.p1.c1

    New

    BioLINCC-BL_NETT_GRU

    phs004077.v1.p1.c1

    New

    BioLINCC-BL_PETAL_CLOVERS_HMB-MDS

    phs004080.v1.p1.c1

    New

    BioLINCC-BL_STEP_IPF_GRU

    phs004085.v1.p1.c1

    New

    New

    heartfailure-Hypox_Ethiopia_GRU

    phs000647.v1.p1.c1

    New

    BioLINCC-BL_HPP_GRU

    phs003907.v1.p1.c1

    New

    Individual_Study-PRIME_AIR_HMB-MDS

    phs003926.v1.p1.c1

    New

    BioLINCC-BL_LOTT_GRU

    phs003933.v1.p1.c1

    New

    COVID19-ACTIV6_GRU

    phs003941.v1.p1.c1

    New

    BioLINCC-BL_WRAP_IPF_GRU

    phs003968.v1.p1.c1

    New

    imaging-img_dMRI_VGC_GRU

    phs004002.v1.p1.c1

    New

    imaging-img_COPDGene_HMB

    phs004023.v1.p1.c1

    New

    imaging-img_COPDGene_DS-CS

    phs004023.v1.p1.c2

    New

    BioLINCC-BL_HIFI_GRU

    phs004032.v1.p1.c1

    New

    Individual_Study-VDKA_DS-ASTHMA

    phs004051.v1.p1.c1

    New

    Individual_Study-STAR_DS-ASTHMA

    phs004052.v1.p1.c1

    New

    BioLINCC-BL_EPIC_GRU

    phs004067.v1.p1.c1

    New

    BioLINCC-BL_ACE_IPF_GRU

    phs004070.v1.p1.c1

    New

    BioLINCC-BL_Panther_IPF_GRU

    phs004071.v1.p1.c1

    New

    BioLINCC-BL_PROP_GRU

    phs004117.v1.p1.c1

    New

    BioLINCC-BL_FIRE_CORAL_HMB-MDS

    phs004130.v1.p1.c1

    New

    BioLINCC-BL_ARDSNet_FACTT_HMB-MDS

    phs004165.v1.p1.c1

    New

    BioLINCC-BL_ARDSNet_EDEN_HMB-MDS

    phs004168.v1.p1.c1

    New

    BioLINCC-BL_HFN_LIFE_GRU

    phs004171.v1.p1.c1

    New

    BioLINCC-BL_BHS_HMB-MDS

    phs004173.v1.p1.c1

    New

    BioLINCC-BL_WHI_LILAC_GRU

    phs004174.v1.p1.c1

    New

    imaging-img_ACCORD_GRU

    phs003562.v1.p1.c1

    Update (Other)

    imaging-img_SPRINT_GRU

    phs003566.v1.p1.c1

    Update (Other)

    imaging-img_MESA_ECG_HMB

    phs003703.v1.p1.c1

    Update (Other)

    imaging-img_MESA_ECG_HMB-NPU

    phs003703.v1.p1.c2

    Update (Other)

    dbGaP-PCGC_DS-CHD

    phs000571.v7.p3.c2

    parent-HCHS-SOL_HMB-NPU_

    phs000810.v2.p2.c1

    Update (Version)

    parent-HCHS-SOL_HMB_

    phs000810.v2.p2.c2

    Update (Version)

    topmed-CARDIA_HMB-IRB

    phs001612.v3.p3.c1

    Update (Version)

    topmed-CARDIA_HMB-IRB-NPU

    phs001612.v3.p3.c2

    Update (Version)

    topmed-LTRC_HMB-MDS

    phs001662.v3.p1.c2

    Update (Version)

    BioLINCC-BL_SHHS_NSRR_HMB-MDS

    phs003637.v2.p1.c1

    Update (Version)

    Yes

    Molecular Atlas of Lung Development (LungMAP)

    phs001961.v2.p1.c1

    LungMAP-MALD_GRU

    Yes

    Yes

    Complement Inhibition Using Eculizumab to Overcome Platelet Transfusion Refractoriness in Patients with Severe Thrombocytopenia (DIR-Eculizumab)

    phs003212.v1.p1.c1

    DIR-Eculizumab_GRU

    Yes

    Yes

    Hydroxyurea to Prevent Organ Damage in Children with Sickle Cell Anemia (BABYHUG)

    phs002415.v1.p1.c1

    BioLINCC-BabyHug_DS-SCD-IRB-RD

    No

    No

    The Genetic Epidemiology of Asthma in Costa Rica (CRA)

    phs000988.v4.p1.c1

    topmed-CRA_DS-ASTHMA-IRB-MDS-RD

    No

    Yes

    Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b)

    phs002808.v1.p1.c1

    topmed-NuMom2B_GRU-IRB

    Yes

    Yes

    Multicenter Study of Hydroxyurea (MSH)

    phs002348.v1.p1.c1

    BioLINCC-MSH_GRU

    No

    No

    The Cleveland Family Study (NSRR-CFS)

    phs002715.v1.p1.c1

    NSRR-NSRR-CFS_DS-HLBS-IRB-NPU

    No

    No

    NHLBI TOPMed: The Jackson Heart Study (JHS)

    phs000964.v5.p1.c1

    topmed-JHS_HMB-IRB-NPU

    No

    Yes

    NHLBI TOPMed: The Jackson Heart Study (JHS)

    phs000964.v5.p1.c2

    topmed-JHS_DS-FDO-IRB-NPU

    No

    Yes

    NHLBI TOPMed: The Jackson Heart Study (JHS)

    phs000964.v5.p1.c3

    topmed-JHS_HMB-IRB

    No

    Yes

    NHLBI TOPMed: The Jackson Heart Study (JHS)

    phs000964.v5.p1.c4

    topmed-JHS_DS-FDO-IRB

    No

    Yes

    NHLBI TOPMed: Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study (FHS)

    phs000974.v5.p3.c1

    topmed-FHS_HMB-IRB-MDS

    No

    Yes

    NHLBI TOPMed: Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study (FHS)

    phs000974.v5.p3.c2

    topmed-FHS_HMB-IRB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Heart and Vascular Health Study (HVH)

    phs000993.v5.p2.c1

    topmed-HVH_HMB-IRB-MDS

    No

    Yes

    NHLBI TOPMed: Heart and Vascular Health Study (HVH)

    phs000993.v5.p2.c2

    topmed-HVH_DS-CVD-IRB-MDS

    No

    Yes

    NHLBI TOPMed: The Vanderbilt AF Ablation Registry (VAFAR)

    phs000997.v5.p2.c1

    topmed-VAFAR_HMB-IRB

    No

    Yes

    NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry (VU)

    phs001032.v6.p2.c1

    topmed-VU_AF_GRU-IRB

    No

    Yes

    NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados (BAGS)

    phs001143.v4.p1.c1

    topmed-BAGS_GRU-IRB

    No

    Yes

    NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation Study (CCAF)

    phs001189.v4.p1.c1

    topmed-CCAF_AF_GRU-IRB

    No

    Yes

    NHLBI TOPMed: Cardiovascular Health Study (CHS)

    phs001368.v3.p2.c1

    topmed-CHS_HMB-MDS

    No

    Yes

    NHLBI TOPMed: Cardiovascular Health Study (CHS)

    phs001368.v3.p2.c2

    topmed-CHS_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Cardiovascular Health Study (CHS)

    phs001368.v3.p2.c3

    topmed-CHS_DS-CVD-MDS

    Yes

    Yes

    NHLBI TOPMed: Cardiovascular Health Study (CHS)

    phs001368.v3.p2.c4

    topmed-CHS_DS-CVD-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AACAC)

    phs001412.v3.p1.c1

    topmed-AACAC_HMB-IRB-COL-NPU

    No

    Yes

    NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AACAC)

    phs001412.v3.p1.c2

    topmed-AACAC_DS-DHD-IRB-COL-NPU

    No

    Yes

    NHLBI TOPMed: MESA and MESA Family AA-CAC (MESA)

    phs001416.v2.p1.c1

    topmed-MESA_HMB

    No

    Yes

    NHLBI TOPMed: MESA and MESA Family AA-CAC (MESA)

    phs001416.v2.p1.c2

    topmed-MESA_HMB-NPU

    No

    Yes

    Clinical-trial of COVID-19 Convalescent Plasma in Outpatients (C3PO)

    phs002752.v1.p1.c1

    COVID19-C3PO_GRU

    No

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)

    phs002910.v1.p1.c1

    COVID19-C4R_COPDGene_HMB

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)

    phs002910.v1.p1.c2

    COVID19-C4R_COPDGene_DS-CS

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Atherosclerosis Risk in Communities Study (ARIC)

    phs002988.v1.p1.c1

    COVID19-C4R_ARIC_HMB-IRB

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Framingham Heart Study (FHS)

    phs002911.v1.p1.c1

    COVID19-C4R_FHS_HMB-IRB-MDS

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Framingham Heart Study (FHS)

    phs002911.v1.p1.c2

    COVID19-C4R_FHS_HMB-IRB-NPU-MDS

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)

    phs002913.v1.p1.c1

    COVID19-C4R_GRU-PUB-NPU

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)

    phs002913.v1.p1.c2

    COVID19-C4R_GRU-PUB

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)

    phs002913.v1.p1.c3

    COVID19-C4R_DS-AAI-PUB-NPU

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)

    phs002913.v1.p1.c4

    COVID19-C4R_DS-AAI-PUB

    Yes

    Yes

    Multi-Ethnic Study of Atherosclerosis (BioLINCC)

    phs003288.v1.p1.c1

    BioLINCC-MESA_HMB

    Yes

    Yes

    Multi-Ethnic Study of Atherosclerosis (BioLINCC)

    phs003288.v1.p1.c2

    BioLINCC-MESA_HMB-NPU

    Yes

    Yes

    Accelerating COVID-19 Therapeutic Interventions and Vaccines 4 ACUTE (ACTIV4a) v1.0, v1.1

    phs002694.v3.p1.c1

    COVID19-ACTIV4A_GRU

    No

    Yes

    COVID-19 Post-hospital Thrombosis Prevention Study (ACTIV4c)

    phs003063.v1.p1.c1

    COVID19-ACTIV4C_GRU

    Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    NHLBI TOPMed: Boston Early-Onset COPD Study in the TOPMed Program (EOCOPD)

    phs000946.v5.p1.c1

    topmed-EOCOPD_DS-CS-RD

    No

    Yes

    NHLBI TOPMed: The Cleveland Family Study (CFS)

    phs000954.v4.p2.c1

    topmed-CFS_DS-HLBS-IRB-NPU

    No

    Data pagearrow-up-right
    wdl-docs GitHub repositoryarrow-up-right
    BDC-Terra release notesarrow-up-right
    BDC-Seven Bridges release notesarrow-up-right
    BDC-PIC-SURE release notesarrow-up-right
    BDC-Dockstore release notesarrow-up-right

    No

    Yes

    Yes

    ApoA-1 and Atherosclerosis in Psoriasis (DIR)

    phs003231.v1.p1.c1

    DIR-ApoA-1_Atherosclerosis_in_Psoriasis_GRU

    Yes

    Yes

    Treatment of Pulmonary Hypertension and Sickle Cell Disease With Sildenafil Therapy (walk-PHaSST)

    phs002383.v1.p1.c1

    BioLINCC-Walk_PHaSST_DS-SCD-IRB-PUB-COL-NPU-MDS-RD

    No

    No

    Yes

    The Mediators of Atherosclerosis in South Asians Living in America (MASALA)

    phs002980.v1.p1.c1

    COVID19-C4R_MASALA_HMB-IRB-COL

    Yes

    Yes

    Prevent Pulmonary Fibrosis (PrePF)

    phs002975.v1.p1.c1

    COVID19-C4R_PrePF_HMB

    Yes

    Yes

    A Multi-site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults (RECOVER)

    phs003463.v1.p1.c1

    RECOVER-Adult

    Yes

    Yes

    Hispanic Community Health Study (HCHS)

    phs003457.v1.p1.c1

    NSRR-HCHS

    Yes

    Yes

    Hispanic Community Health Study (HCHS)

    phs003457.v1.p1.c2

    NSRR-HCHS

    Yes

    Yes

    NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica (CRA)

    phs000988.v5.p1.c1

    topmed-CRA_DS-ASTHMA-IRB-MDS-RD

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)

    phs000920.v5.p2.c2

    topmed-GALAII_DS-LD-IRB-COL

    No

    Yes

    NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy

    phs001293.v3.p1.c1

    topmed-HyperGEN_GRU-IRB

    No

    Yes

    NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy

    phs001293.v3.p1.c2

    HyperGEN_DS-CVD-IRB-RD

    No

    Yes

    NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)

    phs001402.v3.p1.c1

    Mayo_VTE_GRU

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Massachusetts General Hospital (MGH) Atrial Fibrillation Study

    phs001062.v5.p2.c2

    MGH_AF_DS-AF-IRB-RD

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Massachusetts General Hospital (MGH) Atrial Fibrillation Study

    phs001062.v5.p2.c1

    MGH_AF_HMB-IRB

    No

    Yes

    NHLBI TOPMed: African American Sarcoidosis Genetics Resource

    phs001207.v3.p1.c1

    Sarcoidosis_DS-SAR-IRB

    No

    Yes

    NHLBI TOPMed: Women's Health Initiative (WHI)

    phs001237.v3.p1.c1

    WHI_HMB-IRB

    No

    Yes

    NHLBI TOPMed: Women's Health Initiative (WHI)

    phs001237.v3.p1.c2

    WHI_HMB-IRB-NPU

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)

    phs001211.v4.p2.c2

    ARIC_DS-CVD-IRB

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)

    phs001211.v4.p2.c1

    ARIC_HMB-IRB

    No

    Yes

    NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish

    phs000956.v5.p1.c2

    Amish_HMB-IRB-MDS

    No

    Yes

    NHLBI TOPMed: Australian Familial Atrial Fibrillation Study

    phs001435.v2.p1.c1

    AustralianFamilialAF_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Early-onset Atrial Fibrillation in the CATHeterization GENetics (CATHGEN) Cohort

    phs001600.v3.p2.c1

    CATHGEN_DS-CVD-IRB

    No

    Yes

    NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)

    phs001472.v2.p1.c1

    ECLIPSE_DS-COPD-MDS-RD

    No

    Yes

    NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy (GENOA)

    phs001345.v3.p1.c1

    GENOA_DS-ASC-RF-NPU

    No

    Yes

    NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity (GenSalt)

    phs001217.v3.p1.c1

    GenSalt_DS-HCR-IRB

    No

    Yes

    NHLBI TOPMed: GOLDN Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate

    phs001359.v3.p1.c1

    GOLDN_DS-CVD-IRB

    No

    Yes

    NHLBI TOPMed: Defining the time-dependent genetic and transcriptomic responses to cardiac injury among patients with arrhythmias

    phs001434.v2.p1.c1

    miRhythm_GRU

    No

    Yes

    NHLBI TOPMed: Partners HealthCare Biobank

    phs001024.v5.p1.c1

    PARTNERS_HMB

    No

    Yes

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c3

    pharmHU_DS-SCD

    No

    Yes

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c2

    pharmHU_DS-SCD-RD

    No

    Yes

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c1

    pharmHU_HMB

    No

    Yes

    NHLBI TOPMed: REDS-III Brazil Sickle Cell Disease Cohort (REDS-BSCDC)

    phs001468.v3.p1.c1

    REDS-III_Brazil_SCD_GRU-IRB-PUB-NPU

    No

    Yes

    NHLBI TOPMed: San Antonio Family Heart Study (SAFHS)

    phs001215.v4.p2.c1

    SAFHS_DS-DHD-IRB-PUB-MDS-RD

    No

    Yes

    NHLBI TOPMed: Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE)

    phs001467.v2.p1.c1

    SAPPHIRE_asthma_DS-ASTHMA-IRB-COL

    No

    Yes

    NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women

    phs001040.v5.p1.c1

    WGHS_HMB

    No

    Yes

    Long-TerM OUtcomes after the Multisystem Inflammatory Syndrome In Children (MUSIC)

    phs002770.v1.p1.c1

    COVID19-MUSIC_GRU

    Yes

    Yes

    Unrelated Donor Reduced Intensity Bone Marrow Transplant for Children with Severe Sickle Cell Disease (BMT CTN-0601-BioLINCC)

    phs003470.v1.p1.c1

    BioLINCC-BMT_CTN_HMB

    Genetic Epidemiology of COPD Study (COPDGene)

    phs002910.v1.p1.c1

    COVID19-C4R_COPDGene_HMB

    Yes

    Yes

    Genetic Epidemiology of COPD Study (COPDGene)

    phs002910.v1.p1.c2

    COVID19-C4R_COPDGene_DS-CS

    herearrow-up-right
    herearrow-up-right
    BDC-Terra release notesarrow-up-right
    BDC-Seven Bridges release notesarrow-up-right
    BDC-PIC-SURE release notesarrow-up-right

    Yes

    Yes

    GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0, a workflow used for somatic short variant calling. It runs on a single tumor-normal pair or on a single tumor sample, and performs additional filtering and functional annotation tasks, and
  • GATK Create Mutect2 Panel of Normals 4.2.5.0 that creates a panel of normals for use in other GATK workflows. The workflow takes multiple normal sample callsets and passes them to GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0 with tumor-only mode (although it is called tumor-only, normal samples are given as the input) and additionally collates sites present in two or more samples into a sites-only VCF.

  • Three apps from the MetaXcan toolkit:

    • S-PrediXcan for computing associations between omic features and a complex trait starting from GWAS summary statistics.

    • S-MultiXcan for computing association from predicted gene expression to a trait, using multiple studies for each gene.

    • MetaMany for serially performing multiple MetaXcan runs on a GWAS study from summary statistics using multiple tissues.

  • The MetaXcan Workflow for computing associations between omic features and complex traits across multiple tissues. The workflow includes two tools from the MetaXcan framework - MetaMany and S-MultiXcan and it uses summary statistics from a GWAS study and multiple models that predict the expression or splicing quantification.

  • MaxQuant (v2.0.3.0, CWL1.2), a quantitative proteomics tool designed for analyzing large mass-spectrometric data. It uses a target-decoy search strategy to estimate and control the extent of false positives. Within the target-decoy strategy, MaxQuant applies the concept of posterior error probability (PEP) to integrate multiple peptide properties (e.g., length, charge, number of modifications) together with Andromeda score into a single quantity, reflecting the quality of a peptide spectrum match (PSM).

  • NA

    TOPMed Freeze 9 - Batch 4

    various

    various

    false

    NA

    National Sleep Research Resource (NSRR)

    phs002715

    NSRR-CFS

    true

    1

    SPIROMICS (topmed: phs001927)

    phs001927

    SPIROMICS

    true

    1

    2

    PCGC SRA Data

    phs000571

    true

    5

    TOPMed Freeze 9 - WHI

    various

    various

    false

    NA

    MUSIC/CARING (COVID-19)

    phs002770

    MUSIC/CARING

    true

    1

    C3PO (COVID-19)

    phs002752

    C3PO

    true

    1

    TOPMed Freeze 9 - Batch 3

    various

    various

    BostonBrazil_SCD (TOPMed - phs001599)

    phs001599

    BostonBrazil_SCD

    true

    1

    TOPMed - PCGC (Version update)

    phs001735

    PCGC

    import of data from AnVIL to BioData Catalyst Powered by Seven Bridgesarrow-up-right
    Variant Storearrow-up-right
    Study Variable Explorerarrow-up-right
    herearrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right
    Dockstore release notesarrow-up-right

    false

    false

  • Enhanced Compute Capabilities: We deployed AWS 6th and 7th generation compute instance types, along with G5 and G6 GPU-enabled instance types for use in Data Studio sessions. These new instances offer researchers faster networking, better processing power, and improved hardware acceleration for machine learning applications.

  • New bioinformatics workflows: CellTypist, CellPhoneDB Toolkit, Nextflow-scRNA-Seq, and Nextflow-RNAvar.

  • BioLINCC-BL_LRC_PS_GRU

    phs003995.v1.p1.c1

    BioLINCC-BL_SHHS_NSRR_HMB-MDS

    phs003637.v1.p1.c1

    COVID19-ACTIV4A_GRU

    phs002694.v4.p1.c1

    RECOVER-RC_Autopsy_GRU

    phs003768.v2.p2.c1

    parent-FHS_HMB-IRB-MDS_

    phs000007.v34.p15.c1

    parent-FHS_HMB-IRB-NPU-MDS_

    phs000007.v34.p15.c2

    heartfailure-STAMPEED_MIGen_GRU

    phs000294.v1.p1.c1

    parent-PCGC_HMB_

    phs001194.v4.p3.c1

    parent-PCGC_DS-CHD_

    phs001194.v4.p3.c2

    topmed-PUSH_SCD_DS-SCD-IRB-PUB-COL

    phs001682.v3.p1.c1

    BioLINCC-BL_ROC_CPR_GRU

    phs003818.v1.p1.c1

    BioLINCC-BL_PRHHP_GRU

    phs003930.v1.p1.c1

    imaging-img_HCSC-SOL_HMB-NPU

    phs003963.v1.p1.c1

    imaging-img_HCSC-SOL_HMB

    phs003963.v1.p1.c2

    BioLINCC-BL_IPPB_GRU

    phs004010.v1.p1.c1

    Individual_Study-INVESTED_GRU

    phs004011.v1.p1.c1

    BioLINCC-BL_LHS_GRU

    phs004013.v1.p1.c1

    DIR-Stressors_and_Health_Study_HMB-PUB-COL

    phs004019.v1.p1.c1

    BioLINCC-BL_PIOPED_GRU

    phs004020.v1.p1.c1

    BioLINCC-BL_ALLHAT_GRU

    phs004021.v1.p1.c1

    BioLINCC-BL_CONCERT_HF_GRU

    phs004055.v1.p1.c1

    heartfailure-BroadEOMI_DS-CVD

    phs000279.v2.p1.c1

    parent-ARIC_HMB-IRB_

    phs000280.v8.p2.c1

    parent-ARIC_DS-CVD-IRB_

    phs000280.v8.p2.c2

    parent-CARDIA_HMB-IRB_

    phs000285.v3.p2.c1

    parent-CARDIA_HMB-IRB-NPU_

    phs000285.v3.p2.c2

    parent-JHS_HMB-IRB-NPU_

    phs000286.v7.p2.c1

    parent-JHS_DS-FDO-IRB-NPU_

    phs000286.v7.p2.c2

    parent-JHS_HMB-IRB_

    phs000286.v7.p2.c3

    parent-JHS_DS-FDO-IRB_

    phs000286.v7.p2.c4

    heartfailure-LungExome_PAH_GRU

    phs000290.v1.p1.c1

    heartfailure-LHS-COPD_GRU

    phs000291.v2.p1.c1

    heartfailure-Fam_PAH_GRU

    phs000354.v1.p1.c1

    heartfailure-Fam_FAF_HMB

    phs000362.v1.p1.c1

    heartfailure-GENOA_GRU

    phs000379.v1.p1.c1

    heartfailure-PGRN_ACE_HMB

    phs000438.v1.p1.c1

    heartfailure-DrugRes_HTN_GRU

    phs000442.v1.p1.c1

    heartfailure-CAP_GRU

    phs000481.v3.p2.c1

    heartfailure-BEN_HMB

    phs000507.v2.p2.c1

    heartfailure-Fam_IB_GRU

    phs000518.v1.p1.c1

    heartfailure-Fam_DC_DS-FDC

    phs000581.v1.p1.c1

    heartfailure-KCNE1_TDP_HMB

    phs000617.v1.p1.c1

    heartfailure-Hypox_Ethiopia_GRU

    phs000647.v1.p1.c1

    heartfailure-Fam_CHD_HMB

    phs000758.v1.p1.c1

    heartfailure-MiGen_EXS_Ottawa_GRU

    phs000806.v1.p1.c1

    heartfailure-PGRN_DILQTS_GRU

    phs000808.v1.p1.c1

    parent-HCHS-SOL_HMB-NPU_

    phs000810.v2.p2.c1

    parent-HCHS-SOL_HMB_

    phs000810.v2.p2.c2

    heartfailure-MiGen_EXS_ItalAmer_GRU

    phs000814.v1.p1.c1

    heartfailure-Exome_Thrombo-Leuk_GRU

    phs000873.v1.p1.c1

    heartfailure-Twins_Asthma_GRU

    phs000886.v1.p1.c1

    heartfailure-MiGen_EXS_REGICOR_DS-CVD

    phs000902.v1.p1.c1

    heartfailure-MiGEN_EXS_PROMIS_GRU

    phs000917.v1.p1.c1

    heartfailure-BroadEOMI_exome_GRU

    phs000936.v1.p1.c1

    heartfailure-BroadEOMI_exome_DS-MI

    phs000936.v1.p1.c2

    heartfailure-BroadEOMI_exome_DS-CVD

    phs000936.v1.p1.c3

    heartfailure-PGRN_Cardio-Stat_HMB

    phs000963.v1.p1.c1

    heartfailure-MiGEN_EX_UL_DS-CVD

    phs000990.v1.p1.c1

    heartfailure-Endothelial_PAH_GRU

    phs000998.v2.p1.c1

    heartfailure-MiGEN_EXS_MDC_HMB-MDS

    phs001101.v1.p1.c1

    heartfailure-Globin_iPS_GRU

    phs001212.v1.p1.c1

    heartfailure-exRNA_healthy_HMB

    phs001258.v2.p1.c1

    topmed-CHIRAH_DS-ASTHMA-IRB-COL

    phs001605.v3.p1.c2

    topmed-CARDIA_HMB-IRB

    phs001612.v3.p3.c1

    topmed-CARDIA_HMB-IRB-NPU

    phs001612.v3.p3.c2

    topmed-GCPD-A_DS-ASTHMA-GSO

    phs001661.v4.p1.c1

    topmed-LTRC_HMB-MDS

    phs001662.v3.p1.c2

    topmed-sumstats_GRU

    phs001974.v8.p1.c1

    heartfailure-REGARDS_GWAS_HMB-IRB

    phs002719.v1.p1.c1

    COVID19-C4R_CARDIA_HMB-IRB

    phs003045.v2.p2.c1

    COVID19-C4R_CARDIA_HMB-IRB-NPU

    phs003045.v2.p2.c2

    imaging-img_ACCORD_GRU

    phs003562.v1.p1.c1

    imaging-img_SPRINT_GRU

    phs003566.v1.p1.c1

    BioLINCC-BL_SHHS_NSRR_HMB-MDS

    phs003637.v2.p1.c1

    imaging-img_MESA_ECG_HMB

    phs003703.v1.p1.c1

    imaging-img_MESA_ECG_HMB-NPU

    phs003703.v1.p1.c2

    BioLINCC-BL_HPP_GRU

    phs003907.v1.p1.c1

    Individual_Study-PRIME_AIR_HMB-MDS

    phs003926.v1.p1.c1

    Individual_Study-PETAL_ROSE_ARDS_RNASeq_HMB

    phs003929.v1.p1.c1

    BioLINCC-BL_LOTT_GRU

    phs003933.v1.p1.c1

    BioLINCC-BL_WRAP_IPF_GRU

    phs003968.v1.p1.c1

    imaging-img_dMRI_VGC_GRU

    phs004002.v1.p1.c1

    imaging-img_COPDGene_HMB

    phs004023.v1.p1.c1

    imaging-img_COPDGene_DS-CS

    phs004023.v1.p1.c2

    BioLINCC-BL_HIFI_GRU

    phs004032.v1.p1.c1

    Individual_Study-VDKA_DS-ASTHMA

    phs004051.v1.p1.c1

    Individual_Study-STAR_DS-ASTHMA

    phs004052.v1.p1.c1

    BioLINCC-BL_EPIC_GRU

    phs004067.v1.p1.c1

    BioLINCC-BL_ACE_IPF_GRU

    phs004070.v1.p1.c1

    BioLINCC-BL_Panther_IPF_GRU

    phs004071.v1.p1.c1

    BioLINCC-BL_NETT_GRU

    phs004077.v1.p1.c1

    BioLINCC-BL_PETAL_CLOVERS_HMB-MDS

    phs004080.v1.p1.c1

    BioLINCC-BL_STEP_IPF_GRU

    phs004085.v1.p1.c1

    BioLINCC-BL_PROP_GRU

    phs004117.v1.p1.c1

    BioLINCC-BL_FIRE_CORAL_HMB-MDS

    phs004130.v1.p1.c1

    BioLINCC-BL_ARDSNet_FACTT_HMB-MDS

    phs004165.v1.p1.c1

    BioLINCC-BL_ARDSNet_EDEN_HMB-MDS

    phs004168.v1.p1.c1

    BioLINCC-BL_HFN_LIFE_GRU

    phs004171.v1.p1.c1

    BioLINCC-BL_BHS_HMB-MDS

    phs004173.v1.p1.c1

    BioLINCC-BL_WHI_LILAC_GRU

    phs004174.v1.p1.c1

    Imaging_MESA_ECG-r3

    phs003703.v1.p1

    Imaging_SPRINT-r3

    phs003566.v1.p1

    Imaging_ACCORD-r3

    phs003562.v1.p1

    dbGaP_COPDGene_Geno

    phs000765.v3.p2

    dbGaP_FHS_CHARGE-S

    phs000651.v15.p16

    dbgap_FHS_RNA_Brain

    phs002611.v3.p16

    dbGaP_FHS_GutMicro

    phs002560.v3.p16

    dbGaP_BRIDGET_FHS

    phs002559.v3.p16

    dbGap_ADSP_FHS

    phs002558.v3.p16

    dbGaP_T2D-GENES_FHS

    phs001610.v6.p16

    dbGaP_FHS_SHARe

    phs000342.v23.p16

    dbGaP_CCDG-ARIC

    phs001536.v3.p2

    dbGaP_ARIC_CARe

    phs000557.v7.p2

    TOPMed_Freeze10_HCHS_SOL

    phs001395.v3.p2

    TOPMed_Freeze10_CARE_TREXA

    phs001732.v3.p1

    BioLINCC-BL_ARIC_HMB-NPU-MDS

    phs003738.v1.p1.c1

    BioLINCC-BL_BEST_COPD_GRU

    phs004022.v1.p1.c1

    BioLINCC-BL_ROC_PRIMED_GRU

    phs003825.v2.p2.c1

    heartfailure-PGRN_Afib_HMB

    phs000439.v1.p1.c1

    imaging-img_ARIC_HMB-IRB-NPU-MDS

    phs003946.v1.p3.c1

    Individual_Study-PATH_HHT_DS-HHT-IRB-PUB-COL

    phs003948.v1.p1.c1

    CONNECTS_ACTIV4A_v4_r3

    phs002694.v4.p1

    dbGaP_MVP

    phs001672.v13.p1

    dbgap_CCF_AFIB

    phs000820.v2.p1

    dbGaP_FHS_parent

    phs000007.v35.p16

    TOPMed_Freeze10_LTRC

    phs001662.v4.p2

    Imaging-COPDGene-r2

    phs004023.v1.p1

    Terra roadmaparrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right
    PIC-SURE release notesarrow-up-right

    2023-01-09 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2023-01-09 release marks the twelfth release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., Azure volumes now available on both main analysis platforms) along with documentation and tutorials (e.g., information on how variable tags are generated) to help new users get started on the system. This release also includes enhanced support for moving data seamlessly across platforms. Please find more detail on the new features and user support materials in the sections below.

    The 2023-01-09 data releases include the addition of the Pediatric Cardiac Genomics Consortium (PCGC). Please refer to the Data Releases section below for more information as well as the Dataarrow-up-right page on the BDC website.

    hashtag
    Significant new features

    Azure volumes are now available on BDC Powered by Seven Bridges: Users can now link a Microsoft Azure bucket to their Seven Bridges workspaces. After logging in, go to Data > Volumes and select “Microsoft Azure” to be led through a bucket-linking wizard.

    DRS Manifest Export: In order to further improve interoperability and allow users to move their data in a seamless way across platforms, the DRS export option on the Seven Bridges’ platforms is now available. With the new functionality, users can generate links to platform files (DRS URIs) and metadata into a manifest file, which can then be used for importing the files and metadata on other platforms.

    OmicsCircos R Shiny app now available on BDC-Seven Bridges: OmicCircos app is a R Shiny application created around the OmicCircos R package for more effective generation of high-quality circular plots for visualizing genomic data. Common use cases include mutation patterns, copy number variations (CNVs), expression patterns, and methylation patterns. Such variations can be displayed as scatterplot, line, or text-label figures.

    Introduction to SAS Public Project on BDC-Seven Bridges: Seven Bridges released a Public Project to train users on how to use SAS. The public project contains three notebooks that walk a user through: 1) loading and cleaning data in SAS using ICD9 codes, 2) pulling the CDC’s Social Vulnerability Index data via API and running a regression, and 3) loading hosted 1000 Genomes data into SAS and visualizing mutation information. A user can copy the public project to their own workspace and modify the tutorial notebooks to suit their needs.

    New CWL Tools/Workflows on BDC-Seven Bridges:

    • BEDTools 2.30.0 toolkit:

      • BEDTools Coverage - returns the depth and breadth of coverage of features from B on the intervals in A

      • BEDTools Genomecov - computes histograms of feature coverage for a given genome

    Azure is now available on BDC Powered by Terra: Users can now log into Terra with a Microsoft Azure Cloud account. This is an invite-only version of Terra on the Azure platform. The public offering of Terra on Azure is expected in early 2023.

    A new spend report is now available for BDC-Terra billing projects: The report identifies which workspaces are costing the most, to provide more transparency around cloud costs incurred in Terra. To access the spend report, go to your billing project (main menu > billing > billing project) and click on the "Spend report" tab.

    New streamlined user journey from BDC Powered by PIC-SURE to analysis platforms: PIC-SURE has added “Export to Seven Bridges” and “Export to Terra” buttons to streamline data export into a BioData Catalyst analysis workspace. After exploring and filtering variables in PIC-SURE Authorized Access, users can package their data with the Select and Package Data Tool. Once the data is packaged, users can select their preferred BDC analysis platform with the new Export buttons. This provides all information needed and points the user directly to the public PIC-SURE project on either Seven Bridges or Terra.

    Take a Tour of BDC-PIC-SURE: PIC-SURE has updated the guided tour of the interface to interactively display search results based on the user’s authorization. This guided tour walks through the different parts of the platform, including how to use tags, where search results are displayed, and how to interpret the Results Panel.

    hashtag
    Known issues and workarounds

    BABYHUG Data Field Issue: The study BABYHUG, phs002415, contained a data file that included SAS-derived new line characters in data fields. As provided by the data submitter this caused shifts in the data rows, leading to fields being incorrectly mapped to the wrong variable. A new corrected version of the file has been requested from the data submitter.

    hashtag
    New user support materials and documentation

    BDC-PIC-SURE Tag Generation: PIC-SURE has updated help text in the user interface and documentation to address the frequently asked question, “How are variable tags generated?” Users can find this help text in the “Filter by Variable Tags” box on the PIC-SURE platform and in the .

    Updated BDC-PIC-SURE documentation on the Export buttons: The and were updated to include information about the new Export buttons. These updates were also released in the .

    BDC GitBook on BDC-PIC-SURE: Users can now access the BDC GitBook documentation directly from the PIC-SURE platform under the “Help” tab.

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2023-01-09 data release.

    The PCGC substudy contains whole exome sequences, targeted sequences, and SNP array data. It is a multi-center, observational cohort study of individuals with congenital heart defects. The study aims to investigate the relationship between genetic factors and phenotypic and clinical outcomes in patients with CHD. Summary level phenotypes for the study participants can be viewed on the top-level study page. Individual level data and molecular data for the study are available by requesting Authorized Access. The study has collected phenotypic data and source DNA from 10,000 probands, parents, and families of interest. The data is now available for access across the entire ecosystem.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    Planned Upcoming Data Releases

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    For detailed platform release notes please consult the following resources:

    Gen3 release notes PIC-SURE release notes

    2022-04-04 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2022-04-04 release marks the ninth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., machine learning tools for chest CT imaging) along with documentation and tutorials (e.g., a new guide to sharing content) to help new users get started on the system. This release also includes enhanced support for synchronizing tools and workflows between Dockstore and GitHub. Please find more detail on the new features and user support materials in the sections below.

    The 2022-04-04 data release includes the addition of COVID-19 datasets ACTIV4a and ACTIV4b. Please refer to the Data Release section below for more information as well as the Dataarrow-up-right page on the BioData Catalyst website.

    hashtag
    Significant new features

    Machine learning tools for chest CT imaging: Seven Bridges and Harvard Medical School have collaborated to release a Public Project of machine learning tools titled: Automated Chest Imaging Platform (CIP) CT Phenotyping and Machine Learning Discovery in COPD. The Public Project includes a detailed guide for other researchers to use the tools and notebooks on COPD datasets or modify the tools for their own lung CT data.

    Storage optimized instances on Seven Bridges: Users can now access i3 and i3en AWS instances for Interactive Analysis (R Studio, JupyterLabs, SAS Studio) on Seven Bridges. These storage optimized instances provide access to between 5 TB and 60 TB of storage for interactive environments which enables researchers to harmonize larger datasets.

    New CWL tools and workflows on Seven Bridges:

    • short variant discovery 4.2.0.0

    • toolkit

    • 0.2.4

    hashtag
    New user support materials and documentation

    Share content through Public Projects: Seven Bridges has published in the knowledge center offering an alternative way to share new workflows, notebooks, and open access data with the BDCatalyst community. Public Projects provide a space for researchers to publish their analyses with open access sample data, detailed walkthroughs, and contact information for feedback and improvements. Both researchers developing new tools and researchers using preconfigured pipelines benefit from published Public Projects.

    Dockstore synchronization with GitHub: Dockstore has simplified its tool and workflow registration process to automatically synchronize with GitHub. Dockstore released several for how you can set up your GitHub repo with another file (.dockstore.yml) needed to kick off this process. Check out this for an introduction, and visit the updated Getting Started tutorials for registering and on Dockstore to learn more.

    hashtag
    Data Releases

    The table below highlights which studies were included in the Q1 2022 data releases. COVID-19 datasets ACTIV4a and ACTIV4b were released to production. Most of the work for ingestion of COVID19-C3PO dataset has been done and will be released in early April. TOPMed Freeze 9 datasets were ingested as the data became available. Twenty datasets were ingested and will be released as part of the fourth batch in early April as well. The data is now available for access across the entire ecosystem.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    Planned upcoming Data Releases

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    For detailed platform release notes please consult the following resources:

    • PIC-SURE release notes

    2022-10-03 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2022-10-03 release marks the eleventh release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., PIC-SURE's new search interface) along with updated documentation. This release also includes updated versions of the Study Variable Explorer and the Annotation Explorer. Please find more detail on the new features and user support materials in the sections below.

    The 2022-10-03 data releases include the addition of TOPMed Boston-Brazil SCD and PCGC datasets. Please refer to the Data Releases section below for more information as well as the Data pagearrow-up-right on the BioData Catalyst website.

    hashtag
    Significant new features

    Now export with Study Variable Explorer on BioData Catalyst Powered by Seven Bridges: The on BioData Catalyst Powered by Seven Bridges allows researchers to explore phenotypic variables from the TOPMed data dictionaries in an open access manner. Seven Bridges released Study Variable Explorer version 2 which expands on version 1 by adding tag search, notes, and data export. The latest update enables researchers to track their variable selection process through notes tied to study and variable information which can be shared with collaborators through .json export. This gives analysts tractable information for reproducing decision-making during the harmonization process.

    New Interactive Web Apps Gallery: Under the “Public Gallery” dropdown on BioData Catalyst Powered by Seven Bridges, a new display for “Interactive Web Apps” provides access to the LocusZoom and Model Explorer R Shiny applications.

    Annotation Explorer Version 2: The Annotation Explorer enables users to interactively explore, query, and study characteristics of an inventory of annotations for the variants across the genome. This application can be used pre-association testing to interactively explore variant aggregation, filtering strategies, and generate input files for multiple-variant association testing, or post-association testing to explore annotations associated with a set of significant variants or variants of interest. Seven Bridges previously released the Annotation Explorer R Shiny application through a Public Project. Now, Annotation Explorer is integrated with BioData Catalyst Powered by Seven Bridges through the “Data” dropdown. The new integration enables querying genome wide annotations and variants (including the TOPMed Freeze5 and Freeze8 datasets) in a more user-friendly interface without running an R Studio notebook. This release is integrated into the billing system so a user can select their compute needs based on price and monitor Annotation Explorer-specific costs through their billing group.

    New CWL Tools and Workflows on BioData Catalyst Powered by Seven Bridges:

    • GATK VariantEval BETA 4.2.5.0 tool which is used for evaluating variant calls.

    • GATK FilterMutectCalls 4.2.5.0 tool which is used to filter somatic SNVs and indels called by Mutect2.

    • Picard CreateSequenceDictionary 2.25.7 tool for creating a DICT index file for a sequence.

    Updated Interactive Analysis interface on Terra: Under the new design, the “Notebooks” tab is transformed into the more general “Analyses” tab, from where you can access the multiple applications available for Interactive Analysis in Terra. Accordingly, the list of Notebook files (.ipynb) becomes the list of “Your Analyses”, which now supports including R Markdown files (.Rmd). Just like Notebook files, any R Markdown files created in or added to the Analyses tab will be automatically stored in the workspace bucket and synced between the bucket and your persistent disk.

    PIC-SURE's new search interface: PIC-SURE has released an improved dynamic data exploration experience, allowing users to easily search and query at the variable value and genomic variant level. The streamlined search experience enables users to search variables and view associated information, such as decoded variable level information, details about the dataset, and study information - all without opening any data files. Updates to the interface include filtering search results by variable and study tags, a new genomic filtering model, adding variables to export without filtering, a simpler select and package data process, and visualizing single variable distributions.

    Dedicated PIC-SURE images within Seven Bridges analysis workspaces: The Seven Bridges and PIC-SURE teams have collaborated to provide users with dedicated workspace images that contain all the pre-installed packages necessary to run the PIC-SURE example notebooks. PIC-SURE API users in Seven Bridges will not have to worry about changes to package dependencies and/or versions, and R users in particular will notice a significantly faster start-up time during environment set-up. The PIC-SURE images are available in both the JupyterLab and RStudio Seven Bridges environments. Users can find this feature by specifying the Environment setup of any Data Cruncher analysis.

    Cure Sickle Cell Metadata Catalog integration: PIC-SURE has updated the Data Access Table to integrate information about sickle cell disease (SCD) studies from the (MDC). The “Additional Information” column includes a link to that SCD study’s page on the MDC. The Data Access Table also includes other new information, such as study design and study focus.

    hashtag
    New user support materials and documentation

    New BioData Catalyst Powered by PIC-SURE search interface: The documentation associated with PIC-SURE has been updated to reflect the recent release of the new search interface. This includes the and the tutorial videos on the .

    Updated documentation on new Terra Interface: The documentation associated with Terra has been updated to reflect the recent release of the new analysis interface. This includes the Terra and the tutorial videos on the .

    hashtag
    Data Releases

    The table below highlights which studies were included in the Q3 2022 data releases. The data is now available for access across the entire ecosystem.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    Planned Upcoming Data Releases

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    hashtag
    For detailed platform release notes please consult the following resources:

    Gen3 release notes PIC-SURE release notes

    2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2024-04-01 release marks the 17th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., SRA import via DRS and the ability to save dataset IDs). Please find more details on the new features below.

    The 2024-04-01 data releases include the addition of research on heart failure and COVID-19 plus version updates to ongoing genetic and genomic studies including COPD and atrial fibrillation. Please refer to the Data Releases section below for more information as well as the on the BDC website.

    2022-01-24 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2022-01-24 release marks the eighth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., the LocusZoom interactive app) along with documentation and tutorials (e.g., a guide for consortia using Seven Bridges) to help new users get started on the system. Please find more detail on the new features and user support materials in the sections below.

    The 2022-01-24 data release includes the addition of TOPMed Freeze 9 batch 1 & 2, CATHGEN and PETAL RED CORAL datasets. Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.

    2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2024-07-02 release marks the 18th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., an expanded workflow cost estimator, cascading authorization from parent to child studies, and DOIs at the dataset level). Please find more detail on the new features and user support materials in the sections below.

    The 2024-07-02 data releases include the addition of research on atrial fibrillation, asthma, sickle cell disease, atherosclerosis, and more. Please refer to the Data Releases section below for more information as well as the on the BDC website.

    BEDTools GetFasta - extracts sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file

  • BEDTools Intersect - screens for overlaps between two sets of genomic features

  • BEDTools Merge - combines overlapping or “book-ended” features in an interval file into a single feature

  • BEDTools Sort - sorts a feature file by chromosome and other criteria

  • FlowSOM 2.4.0 which presents an algorithm used to distinguish cell populations from both flow and mass cytometry data in an unsupervised way.

  • cytofkit2 0.99.80 which is designed to analyze mass cytometry data from FCS files. It includes preprocessing, cell subset detection, cell subset visualization and interpretation, and inference of subset progression.

  • flowAI 1.24.0 which performs quality control on FCS data acquired using flow cytometry instruments. By evaluating three different properties: flow rate, signal acquisition, dynamic range, and quality control, it enables the detection and removal of anomalies.

  • CNVkit 0.9.9 toolkit for inferring and visualizing copy number from high-throughput DNA sequencing data.

  • SBG Single-Cell RNA Deep Learning - Training is a single cell classifier pipeline for human data. It relies on the transfer learning approach, which uses pre-trained gene embeddings as the starting point for building a model adjusted to given single-cell datasets.

  • SBG Single-Cell RNA Deep Learning - Predict is a single-cell classifier pipeline for human data. This tool uses the deep learning model generated by the SBG Single-Cell RNA Deep Learning - Training workflow to classify the input dataset.

  • Yes

    The Pediatric Cardiac Genomics Consortium (PCGC)

    phs000571.v6.p2.c1

    PCGC-CHD-GENES_HMB

    No

    Yes

    The Collaborative Cohort of Cohorts for COVID-19 Research (C4R)

    phs002988.v1.p1.c1

    phs002910.v1.p1.c1

    phs002910.v1.p1.c2

    phs002911.v1.p1.c1

    phs002911.v1.p1.c2

    phs003017.v1.p1.c1

    phs002919.v1.p1.c1

    C4R_ARIC_phs002988

    C4R_COPDGene_phs002910

    C4R_FHS_phs002911

    C4R_MESA_phs003017

    C4R_REGARDS_phs002919

    No

    Yes

    Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b)

    phs002339.v1.p1.c1

    topmed-NuMom2B_GRU-IRB

    PIC-SURE User Guidearrow-up-right
    PIC-SURE User Guidearrow-up-right
    Authorized Access: Select and Package Data Tool YouTube videoarrow-up-right
    BDC Gitbook documentationarrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right
    Dockstore release notesarrow-up-right

    Yes

    Velocyto.pyarrow-up-right

  • Samplot Plotarrow-up-right

  • Samplot Vcfarrow-up-right

  • Smoovearrow-up-right toolkit

  • Sambambaarrow-up-right tools 0.8.1

  • Yes

    Yes

    Terra release notesarrow-up-right
  • Gen3 release notes

  • COVID-19 ACTIV-4 ACUTE

    phs002694.c1

    ACTIV4A_GRU

    Yes

    Yes

    COVID-19 Outpatient Thrombosis Prevention Trial

    phs002710.c1

    ACTIV4B_GRU

    Freeze 9b batch 4 studies

    various

    various

    No

    No

    COVID-19-C3PO

    phs002752.c1

    C3PO_GRU

    GATK RNAseqarrow-up-right
    GRIDSSarrow-up-right
    scVeloarrow-up-right
    a new guidearrow-up-right
    example templatesarrow-up-right
    overview of the processarrow-up-right
    toolsarrow-up-right
    workflowsarrow-up-right
    Dockstore release notesarrow-up-right
    Seven Bridges release notesarrow-up-right

    Yes

    Yes

    WARP ExomeGermlineSingleSample 2.4.4 pipeline for data pre-processing and variant calling in human WES data.

  • BCFtools 1.15.1 toolkit - CWL1.2

  • Kraken2 2.1.2 toolkit

  • SRA (v3.0.0, CWL1.2)

    • SRA sam-dump that converts SRA data into SAM format. With aligned data, NCBI uses Compression by Reference, which only stores the differences in base pairs between sequence data and the segment it aligns to. The process to restore original data, for example as FASTQ, requires fast access to the reference sequences that the original data was aligned to.

    • SRA fasterq-dump tool that converts SRA data into FASTQ format while using temporary files and multi-threading to speed up the extraction.

    • SRA fastq-dump tool that converts SRA data into FASTQ format.

  • Salmon (v1.5.2, CWL1.2)

    • Salmon Alevin tool that introduces a family of algorithms for quantification and analysis of 3’ tagged-end single-cell sequencing data.

    • Salmon Index tool that builds an index necessary for the Salmon Quant and Salmon Alevin tools. To create an index, it uses a transcriptome reference file in FASTA format. Additionally, one can provide a genome reference along with transcriptome to create a hybrid index compatible with the improved mapping algorithm named Selective Alignment.

  • Yes

    PCGC

    phs001735.c2

    topmed-PCGC_CHD_DS-CHD

    No

    Yes

    National Sleep Research Resource (NSRR)

    phs002715-c1

    NSRR-CFS_DS-HLBS-IRB-NPU

    Yes

    FHS_phs000974_TOPMed_WGS_freeze.9b

    phs000974

    TOPMed_FHS

    No

    Yes

    No

    BostonBrazil_SCD

    phs001599

    topmed-BostonBrazil_SCD_HMB-IRB-COL

    Yes

    PCGC

    phs001735.c1

    topmed-PCGC_CHD_HMB

    PCGC SRA

    phs000571.v6.p2

    PCGC-CHD-GENES_HMB

    Yes

    National Sleep Research Resource (NSRR)

    • This dataset had to be ingested again to accommodate additional data provided by data owners

    phs002715-c1

    NSRR-CFS_DS-HLBS-IRB-NPU

    Study Variable Explorerarrow-up-right
    Cure Sickle Cell Metadata Catalogarrow-up-right
    BioData Catalyst Powered by PIC-SURE User Guidearrow-up-right
    BioData Catalyst Powered by PIC-SURE YouTube playlistarrow-up-right
    Workspace Quickstart Guidearrow-up-right
    Terra YouTube channelarrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right
    Dockstore release notesarrow-up-right

    No

    No

    hashtag
    Significant new features

    BDC Powered by Seven Bridges (BDC-Seven Bridges) SRA Import via DRS: The Sequence Read Archive (SRA) has been accessible via the SRA Toolkit, which involves users downloading a copy to their local environment and then downloading the SRA data to their project on BDC-Seven Bridges. NCBI is now storing the SRA data in cloud buckets on Amazon and Google, allowing users to avoid egress charges and simplifying access to the data via BDC-Seven Bridges’ new SRA to DRS Converter workflowarrow-up-right.

    BDC Powered by PIC-SURE Save Dataset ID: Users can now save the dataset ID after applying filters and building a cohort, allowing them to view and access their saved cohorts at a later time. Saved dataset IDs can be viewed and managed on the Authorized PIC-SUREarrow-up-right Dataset Management page.

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2024-04-01 data release.

    The latest release incorporates studies from the Heart Failure Network (HFN), National Sleep Research Resource (NSRR), Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection (RECOVER Adult), and the Collaborative Cohort of Cohorts for COVID-19 Research (C4R). Additionally, the release broadens its scope with version updates to ongoing genetic and genomic studies, including the NHLBI TOPMed projects such as the evaluation of COPD longitudinally, and the genetic epidemiology of conditions like atrial fibrillation within the CATHGEN cohort, among others.

    The data will be available for access across the entire ecosystem by 2024-04-05.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    Heart Failure Network: Diuretic Optimization Strategies Evaluation in Acute Heart Failure (HFN DOSE-BioLINCC)

    phs003524.v1.p1.c1

    BioLINCC-BL_HFN_DOSE_AHF_GRU

    Yes

    No

    National Sleep Research Resource (NSRR): Hispanic Community Health Study/Study of Latinos

    phs003543.v1.p1.c1

    NSRR-HCHS_HMB-NPU

    Yes

    No

    hashtag
    Planned Upcoming Data Releases

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy

    phs001293.v3.p1.c2

    topmed-HyperGEN_DS-CVD-IRB-RD

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: AF Biobank LMU in the context of the MED Biobank LMU

    phs001543.v2.p1.c1

    topmed-AFLMU_HMB-IRB-PUB-COL-NPU-MDS

    No

    Yes

    hashtag
    For detailed platform release notes please consult the following resources:

    BDC-Gen3 release notes BDC-Terra release notesarrow-up-right BDC-Seven Bridges release notesarrow-up-right BDC-PIC-SURE release notesarrow-up-right

    Data pagearrow-up-right
    hashtag
    Significant new features

    LocusZoom Interactive App on Seven Bridges: LocusZoom, part of the GENESIS pipeline, enables users to interactively visualize and explore results of single variant association tests. The tool also provides a User Guide on the front page that walks users through inputs, outputs, and functionality of the app with the ability to practice on open access data from the University of Michigan. To access the app, please email support@sevenbridges.comenvelope.

    GENESIS Model Explorer Interactive App on Seven Bridges: The Model Explorer app was developed by our collaborators at the University of Washington and then handed off to the Seven Bridges team for hosting. Through the app, users can visualize and explore the results of the GENESIS Null Model workflow including phenotype variables, genotypes, and GENESIS model results without prior R programming knowledge. To access the app, please email support@sevenbridges.comenvelope.

    Phenome-Wide analysis examples on BioData Catalyst studies using PIC-SURE: New example notebooksarrow-up-right are available on Terra (Python and R) and Seven Bridges (RStudio and Python) illustrating how to query data using the PIC-SURE API. It takes a simple PheWAS analysis as a use case. This PheWAS example analysis focuses on the TOPMed DCC Harmonized Variables. The harmonized variables are leveraged to provide an example PheWAS focused on total cholesterol in two studies: ARIC and FHS. This example shows how the PIC-SURE API is helpful in wrangling phenotypic data.

    hashtag
    New user support materials and documentation

    Guide for Consortia using Seven Bridges: Collaborating on the NHLBI BioData Catalyst: A Guide for Consortiaarrow-up-right. This guide describes how consortia can use platform projects to selectively share, harmonize, and distribute data. This guide was inspired by conversations with the C4R consortia which revealed the type of guidance and information data that coordinating centers and consortia members need in order to get set up on BioData Catalyst as quickly as possible. In the outlined example, multiple study centers can bring their data to BioData Catalyst and the Data Coordinating Center (DCC) can then link that data to a centralized project to perform harmonization. The DCC can then distribute select harmonized datasets to analysis working groups that applied for permission to study the harmonized data. Future consortia can use the architecture illustrated in this guide to quickly onboard and begin coordination.

    Guide for Workshops and Courses on Seven Bridges: Using NHLBI BioData Catalyst for Workshops and Coursesarrow-up-right. This guide was developed after Seven Bridges worked with the University of Washington Summer Institute in Statistical Genetics and the American Thoracic Society to develop a summer workshop and a course, respectively. The guide describes the UW Summer Institute and ATS course case studies and step-by-step considerations including a timetable for future educators that could use BioData Catalyst for their classrooms.

    hashtag
    Data Releases

    The table below highlights which studies were included in the data releases done in Q4 2021. TOPMed Freeze 9 datasets were ingested as the data became available. 37 datasets were ingested and released in 2 batches. TOPMed CATHGEN study was released. Of the COVID 19 datasets, PETAL RED CORAL data was released on December 1st after receiving the official publication date. The data is now available for access across the entire ecosystem.

    Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    TOPMed Freeze 9 - Batch 1

    (22 datasets included)

    Various

    Various

    false

    NA

    TOPMed Freeze 9 - Batch 2

    (15 datasets included)

    Various

    Various

    false

    hashtag
    Planned upcoming Data Releases

    Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    PCGC SRA Data

    phs000571

    True

    5

    TOPMed Freeze 9 - Batch 3

    (20 datasets included)

    Various

    Various

    false

    hashtag
    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    • Terra release notesarrow-up-right

    • Seven Bridges release notesarrow-up-right

    • PIC-SURE release notes

    Data pagearrow-up-right
    hashtag
    Significant new features

    Fixed Interoperability on BioData Catalyst Powered By Seven Bridges (BDC-Seven Bridges): BDC-Seven Bridges completed work on updating interoperability functionality. The initial release of the project-based data download restriction functionality inadvertently interfered with DRS data interoperability between BDC-Seven Bridges and other ecosystems such as CAVATICA. This unintentionally re-siloed data on those systems and runs counter to the overarching NIH data ecosystem goals of making data available to users across NIH institute/system boundaries.

    Workflow Cost Estimator Expansion: A feature that enables users to estimate analysis costs before running has been expanded to three new workflows on BDC-Seven Bridges: 1) Cyrius, a tool to genotype CYP2D6 from WGS BAM or CRAM files, 2) kallisto quant, a tool to quantify RNA-seq data, and 3) BEDTools Coverage, a tool that computes both the depth and breadth of coverage of features in file B on the features in file A, useful for comparing WGS files. Users can filter tools based on the interactive cost estimator. See here for documentationarrow-up-right.

    Support Cascading authorization from dbGaP parent to child studies: Gen3 has updated the authorization process in BDC to enable a researcher with access to a dbGaP parent study to automatically gain access to relevant child studies. The authorization process as it existed previously in BDC expected dbGaP to explicitly grant access to both parent and its associated substudies individually. Since dbGaP did not provide explicit access for child studies, users were not able to access these child studies without additional authorization requested manually. With the implementation of support for cascading of authorization from parent to child study, a researcher with access to a dbGaP parent study will also gain access to relevant child studies in BDC, eliminating the need for any manual authorization process.

    Implementation of DOIs at Dataset level: A digital object identifier (DOI) is a persistent identifier or handle used to identify objects uniquely, standardized by the International Organization for Standardization (ISO). In BDC, DOIs have been created and made available at the dataset level to assign a persistent identifier in a standard format. The DOIs are available via the Gen3 discovery page as well as the API. DataCite was used as the registration service. Going forward, every BDC dataset will have a DOI minted as part of the data ingestion process. For a user, having assigned DOIs to datasets will promote research reproducibility and data FAIR-ness.

    View Stigmatizing Variables in PIC-SURE Open Access: Researchers can now view all variables, including stigmatizing variables, that are relevant to their search. Though these variables are not filterable in Open Access to protect participant data, this allows researchers to better understand what information is present in BDC. For more information about stigmatizing variables, please visit the publicly available GitHub repositoryarrow-up-right.

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2024-07-02 data release.

    The latest release includes studies from NHLBI TOPMed projects such as Partners HealthCare Biobank, Novel Risk Factors for the Development of Atrial Fibrillation in Women, and the Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE). New versions of studies like Walk-PHaSST Sickle Cell Disease, the Malmo Preventive Project, and the Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study are also featured. Additionally, the release includes updates to studies like Outcome Modifying Genes in Sickle Cell Disease (OMG) and the Vanderbilt University BioVU Atrial Fibrillation Genetics Study. The Collaborative Cohort of Cohorts for COVID-19 Research (C4R) and NIH RECOVER projects are also part of this release, including studies from the Hispanic Community Health Study/Study of Latinos and the Multi-Ethnic Study of Atherosclerosis.

    The data is now available for access across the entire ecosystem.

    Study Name
    phs I.D. #
    Acronym
    New to BioData Catalyst
    New study version

    NHLBI TOPMed: Partners HealthCare Biobank

    phs001024.v6.p1.c1

    topmed-PARTNERS_HMB

    No

    Yes

    NHLBI TOPMed: Novel Risk Factors

    phs001040.v6.p1.c1

    topmed-WGHS_HMB

    No

    Yes

    hashtag
    Planned Upcoming Data Releases

    Study Name
    phs I.D. #
    Acroynm
    New to BioData Catalyst
    New study version

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c1

    topmed-pharmHU_HMB

    No

    Yes

    HLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c2

    topmed-pharmHU_DS-SCD-RD

    No

    Yes

    hashtag
    For detailed platform release notes please consult the following resources:

    BDC-Gen3 release notes BDC-Terra release notesarrow-up-right BDC-Seven Bridges release notesarrow-up-right BDC-PIC-SURE release notesarrow-up-right

    Data pagearrow-up-right

    2021-01-15 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2021-01-15 release marks the fourth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., CWL workflows to create dataset specific files needed for GWAS) along with documentation and tutorials to help new users get started on the system. This release also includes enhanced support for CWL tools for post-GWAS analysis and a CWL tool for Bcftools Merge and Filter. Please find more detail on the new features and user support materials in the sections below.

    The 2021-01-15 data release includes the addition of both TOPMed studies and the ORCHID Study, conducted by the (PETAL) Clinical Trials Network of NHLBI. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 27 TOPMed studies new to BioData Catalyst. Additionally, 7 TOPMed studies previously hosted on BioData Catalyst were updated to the latest study versions. These updates include new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The associated clinical files were added for the ORCHID study.

    Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.

    hashtag
    Significant new features

    CWL workflows to create dataset specific files needed for GWAS: Users can now find the following CWL workflows for creating dataset specific files needed for GWAS in the :

    • - Filter variants based on linkage disequilibrium measures

    • and - Estimate kinship coefficients

    • - Perform principal components analysis

    CWL tools for post-GWAS analysis: Users can now find the following CWL tools for post-GWAS analysis in the :

    • - Generate screenshots of specific regions of aligned files provided as inputs

    • - Standalone tool for generating static locus zoom plots. Users can make annotated Manhattan plots on specific regions from association files generated with the GENESIS association workflows.

    CWL tool for Bcftools Merge and Filter: Users can now find a CWL tool for in the Seven Bridges Public Apps Gallery. This tool merges multiple VCF/BCF files from non-overlapping sample sets to create one multi-sample file and filter out any monomorphic variants. This tool is useful when working with input files that contain monomorphic variants like the TOPMed datasets.

    hashtag
    New user support materials and documentation

    Genetic Association Testing Using the GENESIS Workflows tutorial: Seven Bridges updated this tutorial to show how to perform an association test using the GENESIS workflows using TOPMed Freeze 8 multi-sample VCF data. Previous versions of this tutorial used TOPMed Freeze 5 data. Version 1.1 of this tutorial can be downloaded as a PDF from the .

    ORCHID Clinical Trial Statistical Analysis Reproduction: NHLBI BioData Catalyst has made data available to authorized investigators for the study titled: PETAL Network: Outcomes Related to COVID-19 Treated With Hydroxychloroquine Among Inpatients With Symptomatic Disease (ORCHID) Trial, phs002299.v1.p1. This is based on the multi-center, double blinded, randomized clinical trial conducted to assess the efficacy of hydroxychloroquine in the treatment of COVID-19. Results were published in JAMA on November 9th, 2020 (). This notebook enables anybody with authorized credentials to reproduce the ORCHID clinical trial results by showing how to 1) Access the data using the PIC-SURE API and 2) Reproduce the results of this study using the open-source R programming language. Available in or through .

    hashtag
    Data release

    The table below highlights which studies were included in the 2021-01-15 data release which includes both TOPMed studies and The Outcomes Related to COVID-19 treated with hydroxychloroquine among In-patients with symptomatic Disease study, or ORCHID Study, conducted by the (PETAL) Clinical Trials Network of NHLBI. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 27 TOPMed studies new to BioData Catalyst. Additionally, 7 TOPMed studies previously hosted on BioData Catalyst were updated to the latest study versions. These updates included new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The associated clinical files were added for the ORCHID study. The data is now available for access across the entire ecosystem.

    hashtag
    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    2020-10-23 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2020-10-23 release marks the third release for the NHLBI BioData Catalyst ecosystem. This release includes several new features along with documentation and tutorials (e.g., bringing your own data and tools) to help new users get started on the system. This release also includes enhanced support for querying annotations for TOPMed Freeze 8 variants in the Annotation Explorer, and querying combined phenotypic and genomic data in PIC-SURE. Please find more detail on the new features and user support materials in the sections below.

    The 2020-10-23 data release includes the addition of both Parent and TOPMed studies. A total of 8 new Parent studies and their respective unharmonized clinical files were added. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 2 TOPMed studies new to BioData Catalyst. Additionally, 6 studies were updated to the latest version. These updates included new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The data is now available for access across the entire ecosystem. Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.

    hashtag
    Significant new features

    • Form cohorts on Gen3 Exploration page and export to Seven Bridges workspace: Users can now export PFB (Portable Format for Bioinformatics) files from Gen3 (e.g., synthetic cohort files from multiple groups) to Seven Bridges.

    • CWL workflows for EPACTS and Plink association tests: Users can now find CWL workflows in the Seven Bridges Public Apps Gallery for the association test methods EPACTS and Plink. More information can be found in .

    • Query annotations for TOPMed Freeze 8 variants in the Annotation Explorer:

    hashtag
    New user support materials and documentation

    • : This guide introduces users to the two Docker-based workflow languages used to run batch analyses in the ecosystem: the Workflow Description Language (WDL) in Terra and the Common Workflow Language (CWL) in Seven Bridges. The guide links to resources that lead users from the early steps of learning to wrap their current pipelines for use in the cloud to how to publish their work in our open access catalog Dockstore to share with the community. This guide was originally conceived in discussion with fellows during the BDCatalyst September Face-to-Face. Fellows developed content and provided feedback and are listed as contributors within the publication.

    • Benchmarking guide for GENESIS association test workflows: This guide provides users with comprehensive benchmarking information for the CWL versions of the GENESIS association workflows. This guide shows the computation costs and execution times for a variety of association tests using 2.5K samples, 10K samples, 36K samples, and 50K samples run on both AWS and Google Cloud. The benchmarking guide can be found on the page “” of the Seven Bridges documentation.

    hashtag
    Data Release

    The table below highlights the new data release on BioData Catalyst which includes both Parent and TOPMed studies. A total of 8 new Parent studies and their respective unharmonized clinical files were added to the ecosystem. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 2 TOPMed studies new to BioData Catalyst. Additionally, 6 TOPMed studies previously hosted on BioData Catalyst were updated to the latest study versions. These updates included new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8 (previously hosted Freeze 5b only). For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The data is now available for access across the entire ecosystem.

    hashtag
    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    2021-10-04 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2021-10-04 release marks the seventh release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., project cost reporting on Terra and archiving files on AWS) along with documentation and tutorials (e.g., estimating and managing cloud costs) to help new users get started on the system. This release also includes enhanced support for semantic search and R Shiny apps. Please find more detail on the new features and user support materials in the sections below.

    The 2021-10-04 data release includes the addition of the final BioLINCC training dataset plus another BioLINCC study, BabyHug. The TOPMed Combined Exchange Area buckets were updated with more datasets from multiple new freezes. The last dataset ingested was PCGC’s CMG. Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.

    hashtag
    Significant new features

    Updated Semantic Search UI: Dug, the BioData Catalyst's Semantic Search, has an updated user interface. The new interface makes it easy to see more results on one page. A zoom feature lets users expand individual results to explore in greater detail. Provenance in knowledge graphs and links to published literature are presented where available.

    Archive files on AWS: Users on BioData Catalyst Powered by Seven Bridges can now select files to move from AWS S3 storage to AWS Glacier (archival storage). Moving files to archival storage can result in an ~80% cost reduction. It’s recommended that users move files to archival storage if the files will not be used for three or more months.

    Project Per Work Space Cost Reporting on Terra: Users on BioData Catalyst Powered by Terra will now have more transparency and access to cost information with the . This update associates each Terra workspace with its own Google Project, created by Terra on behalf of users when workspaces are created. Switching to this “project-per-workspace” model enables added functionality for displaying a breakdown of costs per workspace in the Terra user interface, and allows Terra users to to be notified of cloud spending. This change will only apply to new workspaces created, with plans to migrate existing workspaces over to this model in the future.

    Try out R Shiny apps in Terra: Since the rollout of last quarter, Terra’s Interactive Analysis team has expanded the capabilities of the cloud environments framework that supports running RStudio, Jupyter Notebook and in Terra. Most recently, Terra users now have the ability to . Check out an example of an developed by the Manning Lab to visualize whole-genome association data.

    Save data from an IA environment: With the new R Shiny apps in Terra, users can . Saving data from an interactive cloud environment (such as an instance of RStudio or a Jupyter notebook) is a useful trick in some situations. Users worried about losing work done in an interactive environment because they need to delete or modify the persistent disk can use "gsutil" to copy it to the workspace bucket.

    Speed up machine learning work with GPUs on Terra: Terra’s Interactive Analysis team has released an upgrade that enables . Terra already offered the , and are now responding to user requests to run GPU-enabled computations interactively with GPU support for Jupyter Notebooks.

    Speed up workflows and save costs using N2 instances sporting Intel’s 2nd Generation Xeon CPUs on Terra: Terra users will now have the option to use new-generation N2 instances, which have demonstrated faster performance and reduced cost. Read more about these updates and how to request N2 instances for workflows .

    hashtag
    New user support materials and documentation

    Cross-study harmonization example notebook: will demonstrate how to query and work with the BioData Catalyst studies, particularly cross-study harmonization using the PIC-SURE API.

    Estimate and Manage Cloud Costs on Seven Bridges: describes how to estimate costs associated with using Seven Bridges. The tutorial includes an overview of both cloud storage costs and cloud computation costs and the primary drivers of those costs. The tutorial also provides guidance on how to approach estimating cloud storage and computation costs so that researchers can budget for cloud costs in their grants, request cloud credits, and plan their work on BioData Catalyst.

    Public project for TOPMed Freeze8 variant calling pipelines: Users on Seven Bridges can now access a public project that walks through how to use the CWL tools and workflows that were used to perform variant calling of TOPMed Freeze8. The public project provides explanations of the purpose of all of the tools and workflows and how they are used together, along with examples of completed analyses. All of the CWL tools and workflows in the project are available in the Public Apps Gallery.

    Need an easy way to explain Terra to your colleagues or collaborators? Try this

    Estimate Workflow Costs on Terra: Terra users can also follow . This is the original document describing the steps summarized in this blog post.

    Understanding and controlling cloud costs on Terra: includes a detailed breakdown of the types of costs that you may incur when working on Google Cloud, plus some advice on how to reduce costs.

    Understanding costs and billing on Terra: includes an overview of how billing works, including how billing accounts, projects and workspaces relate to each other, and the difference between workspace permissions and billing permissions.

    Controlling cloud costs on Terra – sample use cases: includes a selection of typical analysis use cases, for which the costs are broken down in several scenarios in order to illustrate the effect of cost control strategies.

    New tools and workflows released to :

    • Three additional WDL workflows have been released in the , including KING, PC-Relate, and PC-AIR.

    • WDL was released to the Utilities collection. This workflow provides the full power of to subset, subsample, and filter VCF files.

    • New with CWL workflows can predict gene expression (or whatever biology the models predict) in a cohort with available genotypes and run associations to a trait measured in the cohort.

    , including Terra:

    • New to Galaxy? The Galaxy Training Network is continuing to add training material in their on Dockstore.

    • Additionally, users can explore some of the Galaxy community’s best practices workflows in their on Dockstore.

    Ready to publish and share the tool or workflow you developed with the research community? Dockstore users can link their accounts to their ORCID and Zenodo accounts, , and now can .

    New video tutorials demonstrate exporting data from PIC-SURE to and sing BioLINCC/Sickle Cell related data.

    hashtag
    Data Releases

    The table below highlights which studies were included in the 2021-10-04 data release. The final BioLINCC training dataset was uploaded, plus another BioLINCC study, BabyHug. The ORCHID dataset was re-ingested after the data owners found they had provided incorrect versions of the files at the time of initial ingestion. The TOPMed Combined Exchange Area buckets were updated with more datasets from multiple new freezes. The last dataset ingested was PCGC’s CMG. The data is now available for access across the entire ecosystem.

    hashtag

    hashtag
    Planned upcoming Data Releases

    hashtag

    hashtag
    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    2021-07-09 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2021-07-09 release marks the sixth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., SAS on Seven Bridges and Galaxy’s integration with Terra) along with documentation and tutorials to help new users get started on the system (e.g., PIC-SURE Open Access). This release also includes enhanced support for maintaining and versioning CWL on external tool repositories. Please find more details on the new features and user support materials in the sections below.

    The 2021-07-09 data release includes the addition of CRAMs and unharmonized clinical files for the parent project CARDIA and 3 other TOPMed programs. The TOPMed program REDS-III received a version update. The unharmonized clinical files were uploaded for the 5 BioLINCC projects and the 2 open tutorial projects. Please refer to the Data Release section below for more information as well as the page on the BioData Catalyst website.

    hashtag
    Significant new features

    Authentication through the NIH Researcher Authentication Service: The BioData Catalyst ecosystem updated the authentication mechanism to use the NIH Researcher Authentication Service. Researchers will now be redirected to the NIH RAS page to enter their eRA Commons credentials when logging into one of the platforms within the ecosystem.

    New Jupyter notebook published in the BioData Catalyst Collection Featured Workspace on Terra.

    Galaxy has integrated with Terra: Galaxy is now available through the other “faces” of Terra, including the . You can launch your very own Galaxy server without having to do any configuration yourself, right from the Terra web interface. This marks a transition from alpha to beta development status of Galaxy on Terra, meaning that the software is more mature and considered reliable enough for regular work, with the caveat that minor changes may occur over time as we smooth out any remaining rough edges and improve user experience in the application. Learn more about how to use Galaxy in Terra . Features of Galaxy and its use within Terra are also featured in our blog post . You can also import Dockstore workflows into Galaxy when it's launched in Terra. Speaking of workflows, Cromwell 64 is now live on Terra.

    : The RStudio image now has , a tool for single-cell transcriptomics, as well as , a package for verifying the integrity of an object in Google Cloud Storage.

    Interactive Analysis: Jupyter Notebook images have been updated with Bioconductor 3.13.0. See the Bioconductor release notes .

    SAS: Users on Seven Bridges can now launch SAS for interactive analysis from the Data Cruncher feature. All project files are available within SAS. Users can select from three SAS offerings built on top of SAS Studio: 1) SAS Business Intelligence enables users to utilize SAS code to manage data, create, modify and compare descriptive and predictive models. Capabilities include clustering, decision trees, linear and logistic regression. 2) SAS Analytics adds the power of SAS Viya’s Data Mining and Machine Learning algorithms such as neural networks, gradient boosting, and random forest. 3) SAS Data Science provides access to text analysis, time series models, advanced forecasting and model governance.

    LocusZoom Interactive Application: Users on Seven Bridges can now launch an R Shiny application that enables users to select, visualize and interactively explore single variant association test results data, with no prior R programming knowledge. Researchers can explore existing analyses available in the database, generate LocusZoom plots for example data, or provide their own association .RData files. The app also provides the JSONizer tool, which enables researchers to subset their association test results (.RData) files and to convert them into the appropriate JSON files required by LocusZoom. Users can launch the application from the in the top navigation bar.

    Example notebook for data import with DRS: These example notebooks on Seven Bridges provide users with the code and steps for importing data from CAVATICA (Kids First data) as well as importing GTEx data from the NHGRI AnVIL system. The import utilizes the DRS functionality to access files that are stored on other NIH cloud systems. Users can find notebooks from the in the top navigation bar.

    CWL v1.2 available: BioData Catalyst Powered by Seven Bridges now supports Common Workflow Language (CWL) version 1.2. The new version of CWL brings a major new functionality - , as well as several minor features and improvements. For the detailed change log please see the and the .

    New CWL tools and workflows on BioData Catalyst Powered by Seven Bridges: Users can find all these tools and more in the :

    • - This is a tool for whole genome regression analysis.

    • - This UW-GAC tool is a standalone app for creating Manhattan and QQ plots from the GENESIS association test results with additional filtering and stratification options available.

    • - This is a scalable SNV and INDEL annotation pipeline, performing a spectrum of annotations in a single tool. It integrates annotations from dozens of databases and annotation tools.

    PIC-SURE Data Access Dashboard Updates: PIC-SURE’s Data Access Dashboard has been updated to include the number of studies and participants the user has access to based on their authorization.

    New PIC-SURE Open Access: is now available in BioData Catalyst! PIC-SURE Open Access is available to users who have an eRA Commons account, including those who are not authorized to access any studies. The Open Access feature allows users to explore de-stigmatized, phenotypic data available in PIC-SURE prior to requesting access to data. For more information check out the and .

    New PIC-SURE are available as public projects in Seven Bridges and Terra as follows:

    • Example showing the users how to access lipid measurements across harmonized variables and multiple visits using the PIC-SURE API in R, RStudio, and Python.

    • All previous notebooks examples are now available in RStudio on Seven Bridges.

    New on Dockstore under the BioData Catalyst organization: This collection includes two WDL workflows to help users prepare their data for association testing: one for converting VCF files to GDS and one for linkage disequilibrium pruning. Stay tuned as more workflows are released.

    New on Dockstore under the BioData Catalyst organization: The WDL workflows in this collection enable scalable, efficient, and flexible genome-wide gene-environment interaction analysis. GEM conducts single-variant analysis for common variants (currently in unrelated individuals only) and MAGEE conducts single-variant and variant set-based analysis for common or rare variants while allowing for relatedness. The collection also includes examples of cloud costs in the README.

    hashtag
    Known issues and workarounds

    BioLINCC Phase 2 data dictionaries: These data dictionaries were submitted in PDF format which required additional intervention and delayed general release to the platform. These data dictionaries will be released as soon as is feasible for use across the platform.

    hashtag
    New user support materials and documentation

    Maintaining and Versioning CWL on External Tool Repositories: presents best practices for writing and maintaining CWL tools/workflows in an external tool repository, such as GitHub, so that users can better manage versions of their tools. Users should follow these best practices if they would like to publish and share their CWL tools and workflows in the since Dockstore has the ability to automatically pull changes from GitHub. These best practices will ensure that the CWL is fully portable and can run successfully not only on Seven Bridges Platforms, but also on other CWL executors such as cwltool and Toil.

    Transferring Files Between Seven Bridges and Terra: guides users through the process of transferring files between the two workspace environments Seven Bridges and Terra.

    Accessing Egress-Free GTEx Data From AnVIL: A new data interoperability page that includes linked instructions for how to access egress-free GTEx data from NHGRI’s AnVIL cloud ecosystem is .

    PIC-SURE Documentation Updates: New provides new information on the Data Access Dashboard, PIC-SURE Open Access, and a new table for understanding study-specific subject identifiers.

    PIC-SURE Video Tutorials: are now available for the following topics:

    • Introduction to PIC-SURE

    • Introduction to PIC-SURE Open Access: Harmonized

    • Introduction to PIC-SURE Open Access: One Criterion Search

    Published a on the role of a secure cloud ecosystem for supporting infrastructure projects and creating connected communities, highlighting BioData Catalyst as one of several NIH-commissioned infrastructure development projects that involve not just putting data on the cloud but also building the additional layers of services that are necessary to deliver on the extraordinary promise of this new model for data sharing and analysis.

    hashtag
    Data Releases

    The table below highligts which studies were included in the 2021-07-09 data release. CRAMs and unharmonized clinical files were uploaded for the parent project CARDIA and 3 other TOPMed programs. The TOPMed program REDS-III received a version update. The unharmonized clinical files were uploaded for the 5 BioLINCC projects and the 2 open tutorial projects. The data is now available for access across the entire ecosystem.

    hashtag

    hashtag
    Planned upcoming Data Releases

    hashtag

    hashtag
    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    2020-08-24 BioData Catalyst Ecosystem Release Notes

    hashtag
    Introduction

    The 2020-08-24 release marks the second release for the NHLBI BioData Catalyst ecosystem. This release includes several new features along with documentation and tutorials (e.g. genome-wide association studies) to help new users get started on the system. This release also includes enhanced support for machine learning in the workspace environments and support for GA4GH industry standard in Dockstore for workflows. Please find more detail on the new features and user support materials in the sections below.

    The 2020-08-24 data release includes the addition of TOPMed Freeze 8 data for a subset of studies on BioData Catalyst. Freeze8 multi-sample VCFs are available for 29 studies, of which 10 studies are new to the ecosystem. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format, in contrast to the Freeze5 multi-sample VCFs which are hosted as tar bundles. For the 10 studies new to BioData Catalyst, CRAM files and unharmonized clinical files are also available for access. The data release further includes updates of many studies to the latest versions that are available on dbGaP. The next data release will include Freeze8 multi-sample VCFs for additional TOPMed studies in addition to unharmonized clinical data and CRAM files for studies that are not yet hosted on the system. Please refer to the Data Release section below for more information as well as the

    National Sleep Research Resource (NSRR): Hispanic Community Health Study/Study of Latinos

    phs003543.v1.p1.c2

    NSRR-HCHS_HMB

    Yes

    No

    NIH RECOVER: A Multi-Site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults

    phs003463.v1.p1.c1

    RECOVER-RC_Adult_GRU

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Prevent Pulmonary Fibrosis (PrePF)

    phs002975.v1.p1.c1

    COVID19-C4R_PREPF_DS-PMD-IRB

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)

    phs002910.v1.p1.c2

    COVID19-C4R_COPDGENE_DS-CS

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)

    phs002910.v1.p1.c1

    COVID19-C4R_COPDGENE_HMB

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)

    phs003017.v1.p1.c1

    COVID19-C4R_MESA_HMB

    Yes

    No

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)

    phs003017.v1.p1.c2

    COVID19-C4R_MESA_HMB-NPU

    Yes

    No

    NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)

    phs001472.v2.p1.c1

    topmed-ECLIPSE_DS-COPD-MDS-RD

    No

    Yes

    NHGRI CCDG: Early-onset Atrial Fibrillation in the CATHeterization GENetics (CATHGEN) Cohort

    phs001600.v3.p2.c1

    topmed-CATHGEN_DS-CVD-IRB

    No

    Yes

    NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy (GENOA)

    phs001345.v3.p1.c1

    topmed-GENOA_DS-ASC-RF-NPU

    No

    Yes

    NHLBI TOPMed: Genetics of Lipid Lowering Drugs and Diet Network (GOLDN)

    phs001359.v3.p1.c1

    topmed-GOLDN_DS-CVD-IRB

    No

    Yes

    NHLBI TOPMed: University of Massachusetts Medical School (UMMS) miRhythm Study

    phs001434.v2.p1.c1

    topmed-miRhythm_GRU

    No

    Yes

    NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish

    phs000956.v4.p1.c2

    topmed-Amish_HMB-IRB-MDS

    No

    Yes

    NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene) in the TOPMed Program

    phs000951.v5.p4.c1

    topmed-COPDGene_HMB

    No

    Yes

    NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene) in the TOPMed Program

    phs000951.v5.p4.c2

    topmed-COPDGene_DS-CS-RD

    No

    Yes

    NHLBI TOPMed: Trans-Omics for Precision Medicine Whole Genome Sequencing Project: ARIC

    phs001211.v4.p2.c1

    topmed-ARIC_HMB-IRB

    No

    Yes

    NHLBI TOPMed: Trans-Omics for Precision Medicine Whole Genome Sequencing Project: ARIC

    phs001211.v4.p2.c2

    topmed-ARIC_DS-CVD-IRB

    No

    Yes

    NHLBI TOPMed: REDS-III Brazil Sickle Cell Disease Cohort (REDS-BSCDC)

    phs001468.v3.p1.c1

    topmed-REDS-III_Brazil_SCD_GRU-IRB-PUB-NPU

    No

    Yes

    NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica

    phs000988.v5.p1.c1

    topmed-CRA_DS-ASTHMA-IRB-MDS-RD

    No

    Yes

    NHLBI TOPMed: Genes-environments and Admixture in Latino Asthmatics (GALA II) Study

    phs000920.v5.p2.c2

    topmed-GALAII_DS-LD-IRB-COL

    No

    Yes

    LungMAP: Molecular Atlas of Lung Development - Human Lung Tissue

    phs001961.v2.p1.c1

    LungMAP-MALD_GRU

    No

    No

    Unrelated Donor Reduced Intensity Bone Marrow Transplant for Children with Severe Sickle Cell Disease (BMT CTN-0601-BioLINCC)

    phs003470.v1.p1.c1

    BioLINCC-BMT_CTN-0601_GRU

    No

    No

    NHLBI TOPMed: Australian Familial Atrial Fibrillation Study

    phs001435.v2.p1.c1

    topmed-AustralianFamilialAF_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Penn Medicine BioBank Early Onset Atrial Fibrillation Study

    phs001601.v2.p1.c1

    topmed-CCDG_PMBB_AF_HMB-IRB-PUB

    No

    Yes

    NHLBI TOPMed: Children's Health Study (CHS) Integrative Genetic Approaches to Gene-Air Pollution Interactions in Asthma (GAP)

    phs001602.v2.p1.c1

    topmed-ChildrensHS_GAP_GRU

    No

    Yes

    NHLBI TOPMed: Children's Health Study (CHS) Integrative Genomics and Environmental Research of Asthma (IGERA)

    phs001603.v2.p1.c1

    topmed-ChildrensHS_IGERA_GRU

    No

    Yes

    NHLBI TOPMed: Children's Health Study (CHS) Effects of Air Pollution on the Development of Obesity in Children (Meta-AIR)

    phs001604.v2.p1.c1

    topmed-ChildrensHS_MetaAir_GRU

    No

    Yes

    NHLBI TOPMed: Chicago Initiative to Raise Asthma Health Equity (CHIRAH)

    phs001605.v2.p1.c2

    topmed-CHIRAH_DS-ASTHMA-IRB-COL

    No

    Yes

    NHLBI TOPMed: Determining the association of chromosomal variants with non-PV triggers and ablation-outcome in AF (DECAF)

    phs001546.v2.p1.c1

    topmed-DECAF_GRU

    No

    Yes

    NHLBI TOPMed: Early-onset Atrial Fibrillation in the Estonian Biobank

    phs001606.v2.p1.c1

    topmed-EGCUT_GRU

    No

    Yes

    NHLBI TOPMed: Genetics of Asthma in Latino Americans (GALA)

    phs001542.v2.p1.c2

    topmed-GALA_DS-LD-IRB-COL

    No

    Yes

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c3

    topmed-pharmHU_DS-SCD

    No

    Yes

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c2

    topmed-pharmHU_DS-SCD-RD

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: The GENetics in Atrial Fibrillation (GENAF) Study

    phs001547.v2.p1.c1

    topmed-GENAF_HMB-NPU

    No

    Yes

    NHLBI TOPMed: Genetic Study of Atherosclerosis Risk (GeneSTAR)

    phs001218.v3.p1.c2

    topmed-GeneSTAR_DS-CVD-IRB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity (GenSalt)

    phs001217.v3.p1.c1

    topmed-GenSalt_DS-HCR-IRB

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)

    phs001395.v2.p1.c2

    topmed-HCHS-SOL_HMB

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)

    phs001395.v2.p1.c1

    topmed-HCHS-SOL_HMB-NPU

    No

    Yes

    NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy

    phs001293.v3.p1.c1

    topmed-HyperGEN_GRU-IRB

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Intermountain INSPIRE Registry

    phs001545.v2.p1.c1

    topmed-INSPIRE_AF_DS-MULTIPLE_DISEASES-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study

    phs001598.v2.p1.c1

    topmed-JHU_AF_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)

    phs001402.v3.p1.c1

    topmed-Mayo_VTE_GRU

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Massachusetts General Hospital (MGH) Atrial Fibrillation Study

    phs001062.v5.p2.c2

    topmed-MGH_AF_DS-AF-IRB-RD

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Massachusetts General Hospital (MGH) Atrial Fibrillation Study

    phs001062.v5.p2.c1

    topmed-MGH_AF_HMB-IRB

    No

    Yes

    NHLBI TOPMed: MyLifeOurFuture (MLOF) Research Repository of patients with hemophilia A (factor VIII deficiency) or hemophilia B (factor IX deficiency)

    phs001515.v2.p1.c1

    topmed-MLOF_HMB-PUB

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Malmo Preventive Project (MPP)

    phs001544.v2.p1.c1

    topmed-MPP_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Partners HealthCare Biobank

    phs001024.v5.p1.c1

    topmed-PARTNERS_HMB

    No

    Yes

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c1

    topmed-pharmHU_HMB

    No

    Yes

    NHLBI TOPMed: San Antonio Family Heart Study (SAFHS)

    phs001215.v4.p2.c1

    topmed-SAFHS_DS-DHD-IRB-PUB-MDS-RD

    No

    Yes

    NHLBI TOPMed: Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE)

    phs001467.v2.p1.c1

    topmed-SAPPHIRE_asthma_DS-ASTHMA-IRB-COL

    No

    Yes

    NHLBI TOPMed: African American Sarcoidosis Genetics Resource

    phs001207.v3.p1.c1

    topmed-Sarcoidosis_DS-SAR-IRB

    No

    Yes

    NHLBI TOPMed: Genome-Wide Association Study of Adiposity in Samoans

    phs000972.v5.p1.c1

    topmed-SAS_GRU-IRB-PUB-COL-NPU-GSO

    No

    Yes

    NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese (THRV)

    phs001387.v3.p1.c3

    topmed-THRV_DS-CVD-IRB-COL-NPU-RD

    No

    Yes

    NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women

    phs001040.v5.p1.c1

    topmed-WGHS_HMB

    No

    Yes

    NHLBI TOPMed: Women's Health Initiative (WHI)

    phs001237.v3.p1.c1

    topmed-WHI_HMB-IRB

    No

    Yes

    NHLBI TOPMed: Women's Health Initiative (WHI)

    phs001237.v3.p1.c2

    topmed-WHI_HMB-IRB-NPU

    No

    Yes

    NHLBI TOPMed: Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE)

    phs001467.v2.p2.c1

    topmed-SAPPHIRE_asthma_HMB-COL

    No

    Yes

    NHLBI TOPMed: Walk-PHaSST Sickle Cell Disease (SCD)

    phs001514.v2.p1.c1

    topmed-Walk_PHaSST_SCD_HMB-IRB-PUB-COL-NPU-MDS-GSO

    No

    Yes

    NHLBI TOPMed: Walk-PHaSST Sickle Cell Disease (SCD)

    phs001514.v2.p1.c2

    otopmed-Walk_PHaSST_SCD_DS-SCD-IRB-PUB-COL-NPU-MDS-RDN

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Malmo Preventive Project (MPP)

    phs001544.v3.p1.c1

    topmed-MPP_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study

    phs001598.v3.p1.c1

    topmed-JHU_AF_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Outcome Modifying Genes in Sickle Cell Disease (OMG)

    phs001608.v2.p1.c1

    topmed-OMG_SCD_DS-SCD-IRB-PUB-COL-MDS-RD

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: The Vanderbilt University BioVU Atrial Fibrillation Genetics Study

    phs001624.v3.p2.c1

    topmed-BioVU_AF_HMB-GSO

    No

    Yes

    NHLBI TOPMed: Genetic Causes of Complex Pediatric Disorders - Asthma (GCPD-A)

    phs001661.v3.p1.c1

    topmed-GCPD-A_DS-ASTHMA-GSO

    No

    Yes

    NHLBI TOPMed: Lung Tissue Research Consortium (LTRC)

    phs001662.v2.p1.c2

    topmed-LTRC_HMB-MDS

    No

    Yes

    NHLBI TOPMed: Pulmonary Hypertension and the Hypoxic Response in SCD (PUSH)

    phs001682.v2.p1.c1

    topmed-PUSH_SCD_DS-SCD-IRB-PUB-COL

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Groningen Genetics of Atrial Fibrillation (GGAF) Study

    phs001725.v2.p1.c1

    topmed-GGAF_GRU

    No

    Yes

    NHLBI TOPMed: Childhood Asthma Management Program (CAMP)

    phs001726.v2.p1.c1

    topmed-CAMP_DS-AST-COPD

    No

    Yes

    NHLBI TOPMed: Best ADd-on Therapy Giving Effective Response (BADGER)

    phs001728.v3.p1.c2

    topmed-CARE_BADGER_DS-ASTHMA-IRB-COL

    No

    Yes

    NHLBI TOPMed: Characterizing the Response to a Leukotriene Receptor Antagonist and an Inhaled Corticosteroid (CLIC)

    phs001729.v3.p1.c2

    topmed-CARE_CLIC_DS-ASTHMA-IRB-COL

    No

    Yes

    NHLBI TOPMed: Pediatric Asthma Controller Trial (PACT)

    phs001730.v2.p1.c2

    topmed-CARE_PACT_DS-ASTHMA-IRB-COL

    No

    Yes

    NHLBI TOPMed: TReating Children to Prevent EXacerbations of Asthma (TREXA)

    phs001732.v2.p1.c2

    topmed-CARE_TREXA_DS-ASTHMA-IRB-COL

    No

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Hispanic Community Health Study/Study of Latinos (HCHS/SOL)

    phs002908.v1.p1.c1

    COVID19-C4R_HCHS_SOL_HMB-NPU

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Hispanic Community Health Study/Study of Latinos (HCHS/SOL)

    phs002908.v1.p1.c2

    COVID19-C4R_HCHS_SOL_HMB

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)

    phs003017.v1.p1.c1

    COVID19-C4R_MESA_HMB

    Yes

    Yes

    Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)

    phs003017.v1.p1.c2

    COVID19-C4R_MESA_HMB-NPU

    Yes

    Yes

    NIH RECOVER: A Multi-Site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults

    phs003463.v2.p2.c1

    RECOVER-RC-Adult_GRU

    No

    Yes

    Heart Failure Network: Functional Impact of GLP-1 for Heart Failure Treatment (HFN FIGHT-BioLINCC)

    phs003542.v1.p1.c1

    BioLINCC_BL_HFN-FIGHT_GRU

    No

    Yes

    Action to Control Cardiovascular Risk in Diabetes (ACCORD-BioLINCC)

    phs003551.v1.p1.c1

    BioLINCC-BL_ACCORD_GRU

    No

    Yes

    Action to Control Cardiovascular Risk in Diabetes (ACCORD - Imaging)

    phs003562.v2.p1.c1

    imaging-ACCORD_GRU

    No

    Yes

    Systolic Blood Pressure Intervention Trial (SPRINT-Imaging)

    phs003566.v2.p1.c1

    imaging-SPRINT_GRU

    No

    Yes

    Framingham Heart Study-Cohort (FHS-Cohort) - Imaging

    phs003593.v1.p1.c1

    Imaging-img_FHS_HMB-IRB-MDS

    No

    Yes

    Framingham Heart Study-Cohort (FHS-Cohort) - Imaging

    phs003593.v1.p1.c2

    Imaging-img_FHS_HMB-IRB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466.v2.p1.c3

    topmed-pharmHU_DS-SCD

    No

    Yes

    NHLBI TOPMed: Partners HealthCare Biobank

    phs001024.v6.p1.c1

    topmed-PARTNERS_HMB

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: The Vanderbilt University BioVU Atrial Fibrillation Genetics Study

    phs001624.v3.p2.c1

    topmed-BioVU_AF_HMB-GSO

    No

    Yes

    NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women

    phs001040.v6.p1.c1

    topmed-WGHS_HMB

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study

    phs001598.v3.p1.c1

    topmed-JHU_AF_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed - NHGRI CCDG: Malmo Preventive Project (MPP)

    phs001544.v3.p1.c1

    topmed-MPP_HMB-NPU-MDS

    No

    Yes

    NHLBI TOPMed: Pathways to Immunologically Mediated Asthma (PIMA)

    phs001727.v3.p1.c2

    topmed-PIMA_DS-ASTHMA-IRB-COL

    No

    Yes

    NHLBI TOPMed: Characterizing the Response to a Leukotriene Receptor Antagonist and an Inhaled Corticosteroid (CLIC)

    phs001729.v3.p1.c2

    topmed-CARE_CLIC_DS-ASTHMA-IRB-COL

    No

    Yes

    NHLBI TOPMed: Best ADd-on Therapy Giving Effective Response (BADGER)

    phs001728.v3.p1.c2

    topmed-CARE_BADGER_DS-ASTHMA-IRB-COL

    No

    Yes

    Guiding Evidence Based Therapy Using Biomarker Intensified Treatment in Heart Failure (GUIDE-IT-BioLINCC)

    phs003621.v1.p1.c1

    BioLINCC-BL_GUIDE-IT_GRU

    Yes

    Yes

    Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION-BioLINCC)

    phs003599.v1.p1.c1

    BioLINCC-BL_HF-ACTION_HMB

    Yes

    Yes

    Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION-BioLINCC)

    phs003599.v1.p1.c2

    BioLINCC-BL_HF-ACTION_HMB-NPU

    Yes

    Yes

    Sleep Heart Health Study (SHHS-BioLINCC)

    phs003637.v1.p1.c1

    BioLINCC-BL_SHHS_HMB-MDS

    Yes

    Yes

    NA

    topmed-CATHGEN_DS-CVD-IRB

    phs001600

    CATHGEN

    True

    6

    PETAL - RED CORAL (COVID-19)

    phs002363

    RED_CORAL

    True

    1

    NA

    TOPMed Freeze 9 - Batch 4

    (20 datasets included)

    Various

    Various

    false

    NA

    ACTIV-4A

    phs002694

    ACTIV4A

    True

    1

    ACTIV-4B

    phs002710

    ACTIV4B

    True

    1

    Dockstore release notesarrow-up-right

    PC-Relatearrow-up-right - Estimate genetic relatedness

    NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese (THRV)

    phs001387

    THRV

    Yes

    NHBLI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)

    phs001466

    pharmHU

    Yes

    NHLBI TOPMed: Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE)

    phs001467

    SAPPHIRE_asthma

    Yes

    NHLBI TOPMed: MyLifeOurFuture (MLOF) Hemophilia Study

    phs001515

    MLOF

    Yes

    NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AA CAC)

    phs001412

    AACAC

    Yes

    NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women

    phs001040

    WGHS

    Yes

    NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry (VU_AF)

    phs001032

    VU_AF

    NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica

    phs000988

    CRA

    Yes

    NHLBI TOPMed - NHGRI CCDG: MGH Atrial Fibrillation Study

    phs001062

    MGH_AF

    Yes

    NHLBI TOPMed: Australian Familial Atrial Fibrillation Study

    phs001435

    AustralianFamilialAF

    Yes

    NHLBI TOPMed: African American Sarcoidosis Genetics Resource

    phs001207

    Sarcoidosis

    Yes

    NHLBI TOPMed: CHS Gene-Air Pollution Interactions in Asthma (GAP)

    phs001602

    ChildrensHS_GAP

    Yes

    NHLBI TOPMed: CHS (Effects of Air Pollution on the Development of Obesity in Children)

    phs001604

    ChildrensHS_MetaAir

    Yes

    NHLBI TOPMed - NHGRI CCDG: AFLMU

    phs001543

    AFLMU

    Yes

    NHLBI TOPMed - NHGRI CCDG: Malmo Preventive Project (MPP)

    phs001544

    MPP

    Yes

    NHLBI TOPMed - NHGRI CCDG: Intermountain INSPIRE Registry

    phs001545

    INSPIRE_AF

    Yes

    NHLBI TOPMed: Texas Cardiac Arrhythmia Institute - DECAF Study

    phs001546

    DECAF

    Yes

    NHLBI TOPMed: Early-onset Atrial Fibrillation in the Estonian Biobank

    phs001606

    EGCUT

    Yes

    NHLBI TOPMed: CHS Integrative Genomics and Environmental Research of Asthma (IGERA)

    phs001603

    ChildrensHS_IGERA

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607

    IPF

    Yes

    NHLBI TOPMed - NHGRI CCDG: The GENetics in Atrial Fibrillation (GENAF) Study

    phs001547

    GENAF

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607

    IPF

    Yes

    NHLBI TOPMed: Chicago Initiative to Raise Asthma Health Equity (CHIRAH)

    phs001605

    CHIRAH

    Yes

    NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing

    phs001607

    IPF

    Yes

    NHLBI TOPMed: Outcome Modifying Genes in Sickle Cell Disease (OMG)

    phs001608

    OMG_SCD

    Yes

    NHLBI TOPMed - NHGRI CCDG: Vanderbilt University BioVU Atrial Fibrillation Genetics Study

    phs001624

    BioVU_AF

    Yes

    NHLBI TOPMed: Lung Tissue Research Consortium (LTRC)

    phs001662

    LTRC

    Yes

    NHLBI TOPMed CCDG: Groningen Atrial Fibrillation (GGAF) Study

    phs001725

    GGAF

    Yes

    NHLBI TOPMed: Pathways to Immunologically Mediated Asthma (PIMA)

    phs001727

    PIMA

    Yes

    NHLBI TOPMed: Best ADd-on Therapy Giving Effective Response (BADGER)

    phs001728

    CARE_BADGER

    Yes

    NHLBI TOPMed: Characterizing the Response to a Leukotriene Receptor Antagonist and an Inhaled Corticosteroid (CLIC)

    phs001729

    CARE_CLIC

    Yes

    NHLBI TOPMed: Pediatric Asthma Controller Trial (PACT)

    phs001730

    CARE_PACT

    Yes

    NHLBI TOPMed: TReating Children to Prevent EXacerbations of Asthma (TREXA)

    phs001732

    CARE_TREXA

    Yes

    PETAL Network: Outcomes Related to COVID-19 Treated With Hydroxychloroquine Among Inpatients With Symptomatic Disease (ORCHID) Trial

    phs002299

    ORCHID

    Yes

    PIC-SURE release notes
  • Dockstore release notesarrow-up-right

  • Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans

    phs000972

    SAS

    NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados

    phs001143

    BAGS

    Data pagearrow-up-right
    Seven Bridges Public Apps Galleryarrow-up-right
    LD Pruningarrow-up-right
    KING robustarrow-up-right
    KING IBDsegarrow-up-right
    PC-AiRarrow-up-right
    Seven Bridges Public Apps Galleryarrow-up-right
    SBG Loci Snapshoterarrow-up-right
    LocusZoomarrow-up-right
    BCFtools Merge and Filterarrow-up-right
    Tutorials page of the BioData Catalyst GitBookarrow-up-right
    paper available herearrow-up-right
    Seven Bridges Public Project, under PIC-SURE APIarrow-up-right
    PIC-SURE GitHubarrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right

    Yes

    Users on Seven Bridges can now use the Annotation Explorer to interactively aggregate and filter ~1 billion variants from TOPMed Freeze 8 using 450 annotations. Variant grouping files can be created from the results and exported to a workspace for use in rare variant association testing. Users with dbGaP approval for one or more TOPMed studies are able to access and work with the full Freeze 8 variant annotation database.
  • Query open access variant annotations in Annotation Explorer: Users on Seven Bridges without dbGaP approval for any TOPMed studies can now make use of the Annotation Explorer and interactively query TOPMed variants from Freeze 5 that have been released in dbSNP, a public-domain archive for human variants. Users can aggregate and filter ~550 million variants using ~260 annotations available in this dataset and generate variant grouping files for rare variant association testing.

  • Query combined phenotypic and genomic data in PIC-SURE: A release of genomic data in PIC-SURE now allows users to perform combined phenotypic and genomic queries to see phenotypic/genomic correlations. Users can export queries/cohorts to Seven Bridges or Terra Workspaces using the PIC-SURE API.

  • Bring Your Own Data to Terra Tutorialarrow-up-right: We published a Jupyter notebook that provides functions for users to programmatically upload data to their Terra Google bucket and organize associated data into data tables for input into workflows. This may be a helpful resource for users that plan to upload many files. This notebook is part of a growing code library available in the BioData Catalyst Collection workspacearrow-up-right.

  • Utilities Workflows on Dockstore: The BioData Catalyst Organization on Dockstore now has a Utilities collectionarrow-up-right with workflows for completing common tasks such as data import, genotype file processing, and quality control of whole genome or exome sequencing data. Fellow Kenny Westermann developed a workflow that fetches data from dbGaP for use in BDCatalyst.

  • NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene)

    phs000951

    COPDGene

    Yes

    The Diabetes Heart Study (DHS)

    phs001012

    DHS

    Yes

    Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)

    phs001252

    ECLIPSE

    Yes

    NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)

    phs001472

    ECLIPSE

    Yes

    NHLBI TOPMed: Boston Early-Onset COPD Study in the TOPMed Program

    phs000946

    EOCOPD

    Yes

    NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)

    phs000920

    GALAII

    Yes

    NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy (GENOA)

    phs001345

    GENOA

    Yes

    NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity (GenSalt)

    phs001217

    GenSalt

    Yes

    Hispanic Community Health Study /Study of Latinos (HCHS/SOL)

    phs000810

    HCHS-SOL

    Yes

    Pediatric Cardiac Genomics Consortium (PCGC) Study

    phs001194

    PCGC

    Yes

    NHLBI TOPMed: PCGC's Congenital Heart Disease Biobank

    phs001735

    PCGC_CHD

    Yes

    PGRN-RIKEN: Rate Control Therapy in Patients with Atrial Fibrillation

    phs000439

    PGRN-RIKEN_AF

    Yes

    NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment (SAGE)

    phs000921

    SAGE

    Yes

    SNP Health Association Resource (SHARe) Asthma Resource Project (SHARP)

    phs000166

    SHARP

    Yes

    PIC-SURE release notes
  • Dockstore release notesarrow-up-right

  • Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    NHLBI GO-ESP: Lung Cohorts Exome Sequencing Project (Asthma)

    phs000422

    Asthma

    Yes

    CATHeterization GENetics (CATHGEN)

    phs000703

    CATHGEN

    Yes

    Data pagearrow-up-right
    this blog postarrow-up-right
    Bring Your Own Tools to BioData Catalystarrow-up-right
    GWAS with GENESISarrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right

    PETAL - ORCHID (data re-ingested since files initially provided by data submitters were not the final version )

    phs002299

    ORCHID

    false

    1

    PCGC (CMG/Wagner)

    CMG

    true

    1

    CureSCi - BabyHug (via BioLINCC)

    phs002415

    BabyHug

    true

    1

    PIC-SURE release notes
  • Dockstore release notesarrow-up-right

  • Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    BioLINCC (Phase 1) - Training Data (Digitalis)

    open

    true

    NA

    Additional TOPMed combined EA

    c999

    Freeze1/

    Freeze9b/

    Freeze10a

    true

    Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    TOPMed Freeze 9 - Batch 1

    (22 datasets included)

    Various

    Various

    false

    NA

    PCGC SRA Data

    Additional TOPMed Freeze 8 Studies (CATHGen)

    phs000571

    true

    Data pagearrow-up-right
    rollout of PPWSarrow-up-right
    set up and use GCP budget alertsarrow-up-right
    Rstudio and Bioconductorarrow-up-right
    Galaxyarrow-up-right
    launch R Shiny apps from Terra’s built-in RStudio environmentarrow-up-right
    open-source R Shiny apparrow-up-right
    save data from an IA environmentarrow-up-right
    adding Graphical Processor Units (GPUs) to Notebook cloud environments in Terraarrow-up-right
    ability to use GPUs in workflowsarrow-up-right
    herearrow-up-right
    This tutorial notebookarrow-up-right
    This tutorialarrow-up-right
    quick (2-min.) overview of Terra.arrow-up-right
    this documentation to estimate costs of workflowsarrow-up-right
    This articlearrow-up-right
    This articlearrow-up-right
    This articlearrow-up-right
    Dockstore’s NHLBI BioData Catalyst Organizationarrow-up-right
    UWGAC Ancestry, Relatedness, and Association Testing Collectionarrow-up-right
    xvcfViewarrow-up-right
    bcftools viewarrow-up-right
    PrediXcan collectionarrow-up-right
    Launch Galaxy workflows from Dockstore into multiple Galaxy instancesarrow-up-right
    Organizationarrow-up-right
    IWC Organizationarrow-up-right
    mint DOIs for their workflows hosted on Dockstorearrow-up-right
    export their workflows directly to their ORCID profilearrow-up-right
    Terraarrow-up-right
    Seven Bridges uarrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right

    NA

    6

    GENESIS Update Null Model for Fast Score Testarrow-up-right - This updates the null model file obtained with the GENESIS Null model workflow so that it can be used in the GENESIS Single Variant Association Testing workflow in fast score mode.

  • Missing rate by samplearrow-up-right - This UW-GAC tool was created for QC in GWAS. The tool calculates missing rate by sample. A subset of variants may be specified.

  • Missing rate by variantarrow-up-right - This UW-GAC tool was created for QC in GAWS. The tool calculates missing rate by variant. A subset of samples and/or variants may be specified.

  • Allele frequencyarrow-up-right - This UW-GAC tool was created for QC in GAWS. The tool calculates allele frequency and counts. Values for both the alternate allele (count, frequency) and the minor allele (MAC, MAF) are returned. A subset of samples and/or variants may be specified.

  • Id-indexarrow-up-right - This UW-GAC tool calculates the LD among an index variant and each variant in a set of other variants stored in a GDS file using the snpgdsLDMat function in the SNPRelate R packagearrow-up-right and a wrapper LDcompute R packagearrow-up-right.

  • Id-pairarrow-up-right - This UW-GAC tool calculates the LD between a pair of variants stored in a GDS file using the snpgdsLDMat function in the SNPRelate R packagearrow-up-right and a wrapper LDcompute R packagearrow-up-right.

  • Id-setarrow-up-right - This UW-GAC tool calculates the LD between all pairs in a user-specified set of variants stored in a GDS file using the snpgdsLDMat function in the SNPRelate R packagearrow-up-right and a wrapper LDcompute R packagearrow-up-right.

  • Introduction to PIC-SURE Open Access
  • Introduction to PIC-SURE Open Access: Multiple search criteria

  • Introduction to PIC-SURE Authorized Access

  • Introduction to PIC-SURE Authorized Access: Data Export

  • phs001601

    CCDG-PMBB

    true

    1

    phs002385

    CIBMTR

    true

    1

    phs002362

    CSSCD

    true

    1

    phs002348

    MSH

    true

    1

    phs002386

    STOPII

    true

    1

    phs001542

    GALA

    true

    1

    phs001661

    GCPD-A

    true

    2

    phs001468

    REDS-III

    false

    2

    Tutorial-biolincc_camp

    open

    true

    tutorial-biolincc_framingham

    open

    true

    BioLINCC – BabyHug

    phs002415

    true

    PIC-SURE release notes
  • Dockstore release notesarrow-up-right

  • Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    Treatment of Pulmonary Hypertension and Sickle Cell Disease with Sildenafil Therapy

    phs002383

    WalkPHaSST

    true

    1

    CARDIA Cohort

    phs000285

    CARDIA

    false

    Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    Combined Exchange Area new data

    false

    BioLINCC – Training Dataset – Digitalis

    Dataarrow-up-right
    Sample and Variant quality control methods with Hailarrow-up-right
    NHLBI BioData Catalystarrow-up-right
    herearrow-up-right
    herearrow-up-right
    Seurat package now included by default in R-based cloud environmentsarrow-up-right
    Seuratarrow-up-right
    crcmodarrow-up-right
    herearrow-up-right
    University of Michiganarrow-up-right
    Public project “LocusZoom Shiny App”arrow-up-right
    Public Project “Data Interoperability”arrow-up-right
    conditional execution of workflow stepsarrow-up-right
    CWL CommandLineTool specificationarrow-up-right
    CWL Workflow specificationarrow-up-right
    Public Apps Galleryarrow-up-right
    Regenie 2.0.1arrow-up-right
    GENESIS Association results plottingarrow-up-right
    WGSA 0.9arrow-up-right
    PIC-SURE Open Accessarrow-up-right
    user guidearrow-up-right
    tutorialarrow-up-right
    Jupyter notebook examplesarrow-up-right
    UWGAC Ancestry and Relatedness analysis collectionarrow-up-right
    Large-scale Gene by Environment collectionarrow-up-right
    This tutorialarrow-up-right
    Dockstore repositoryarrow-up-right
    This tutorialarrow-up-right
    herearrow-up-right
    PIC-SURE documentationarrow-up-right
    PIC-SURE Video tutorialsarrow-up-right
    blog postarrow-up-right
    Terra release notesarrow-up-right
    Seven Bridges release notesarrow-up-right

    3

    on the BioData Catalyst website.

    hashtag
    Significant new features

    • GENESIS tutorial and public project: Seven Bridges has made a public project available that introduces users to the GENESIS R package and related R packages (SeqArray, SeqVarTools, and SNPRelate) used in mixed model association testing in sequence data. The examples in the project help users understand the code that is used in the GENESIS public apps (available on GitHubarrow-up-right), prepare data for input to those apps, and interact with the results. The “GENESIS Tutorial” public project can be found in the list of Seven Bridges public projects on the top navigation bar of the platform.

    • Launch machine learning packages in Jupyterlab Notebooks: Users on Seven Bridges can now use a docker image with pre-installed librariesarrow-up-right that support machine learning analyses when working in Jupyterlab Notebooks. This docker image can be found in the Data Cruncher feature: Select “Create new analysis” and then, under the Environment setup menu, select “SB Machine Learning - TensorFlow 2.0, Python 3.7.”

    • Support for larger GPU instances: A larger is now available for researchers working in Jupyterlab Notebooks and RStudio on Seven Bridges. The p3dn.24xlarge instance has 1800GB SSD, 96vCPUs, 768GB RAM, and 8 GPUs. These higher memory cards enable machine learning training on large 3D images and high-performance computing applications.

    • Data Cruncher Interactive Analyses: Seven Bridges now features a “” public project, found in the list of public projects on the top navigation bar of the platform, with example analyses to help users interpret results from secondary analysis. The project has eight separate analyses - three in RStudio and five in Jupyterlab Notebooks - including one on VCF visualization and one on structural variant analysis. Read the .

    • Launch Dockstore workflows in Seven Bridges: Users can now find CWL workflows in Dockstore and launch them in the Seven Bridges workspace environment.

    • Export large PFBs from Gen3 to Terra: Users can now export large PFB (Portable Format for Bioinformatics) files from Gen3 (e.g. synthetic cohort files from multiple groups) to Terra. New backend systems now automatically parse files more efficiently.

    • Automatic syncing with GitHub apps: Dockstore now with any changes you make to your linked GitHub repository.

    • Link ORCID iDs to published workflows: Users can now link their to their Dockstore accounts, and make iDs visible via their organizations, and in workflows and tools they have starred. Users searching Dockstore’s catalog will be able to associate workflows you contribute with scientific publications.

    • GA4GH TRS Support: Dockstore now implements the (TRS) v2 standard. The goal of the TRS API is to provide a standardized way to describe the availability of tools and workflows.

    • Transfer datasets to Jupyter Notebooks with Query Id: Users that query the PIC-SURE UI and apply filters to create datasets can submit their query ID to the PIC-SURE client library via an R or Python and do not need to re-build the query manually.

    • Data Tree Optimizations: The PIC-SURE data tree has been optimized to show users only the studies they have been authorized to see and rendered more efficiently to allow users to select studies faster.

    • Export data dictionary of clinical variables: are now available that provide directions on exporting the full data dictionary of all clinical variables to a CSV via PIC-SURE.

    hashtag
    New user support materials and documentation

    • Overview of the ecosystem: This collaboratively developed overview documentarrow-up-right guides new users through the process of understanding what the BioData Catalyst is to getting started using the ecosystem.

    • Tips for reliable and efficient analysis set-up: This guidearrow-up-right provides recommendations on how to set up your initial set of analyses, tips for running tools/workflows, and specifications for computational resources on Seven Bridges.

    • Genetic Association Testing Using GENESIS Workflows: guides users through the steps of running a single variant or multiple variant association test on Seven Bridges using the GENESIS R package pipelines.

    • Troubleshooting Tasks: presents some of the most common errors in task execution on Seven Bridges and shows you how to debug and resolve them.

    • GWAS tutorial and example cloud costs: Terra’s walks users through the steps of preparing data for input using Hail in Jupyter notebooks and running association tests as workflows with the GENESIS R package and provides example cloud costs derived from the tutorial.

    • Code Library: This release includes a Terra containing R and Python Jupyter Notebooks that cover how to use the Integrated Genomics Viewer with data from Gen3, workflows for merging VCF files, and expanded features for interacting with data using the Data Repository Service (DRS) such as bulk downloads.

    • Dockstore Fundamentals: A are available from the recent workshop Dockstore Fundamentals: Introduction to Docker and Descriptors for Reproducible Analysis.

    • API and User Interface Technical Documentation: provides users with information about the PIC-SURE API and user interface and examples of how to load data into the PIC-SURE High Performance Data Store.

    • Scalability and cost-effectiveness analysis of whole genome-wide association studies in the Cloud: from PIC-SURE provides a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies and helps users understand the trade-off between speed and cost.

    hashtag
    Data release

    The table below highlights which TOPMed studies were included in the 2020-08-24 data release. Freeze 8 multi-sample VCFs were added for the 29 studies listed in the table below. This includes 19 studies which were previously hosted on BioData Catalyst with Freeze 5b data available and 10 studies which are new to BioData Catalyst. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. For the 10 studies which are new to BioData Catalyst, CRAM files and unharmonized clinical files are also available for access. Additionally, 10 of these studies were updated to the latest version. The data is now available for access across the entire ecosystem.

    Study Name

    phs I.D. #

    Acronym

    New to BioData Catalyst

    New study version

    NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish

    phs000956

    Amish

    -

    Yes

    NHLBI TOPMed: Atherosclerosis Risk in Communities

    phs001211

    ARIC

    -

    hashtag
    For detailed platform release notes please consult the following resources:

    • Gen3 release notes

    • Terra release notesarrow-up-right

    • Seven Bridges release notesarrow-up-right

    • PIC-SURE release notes

    Data pagearrow-up-right

    -

    NHLBI TOPMed: NHGRI CCDG: The BioMe Biobank at Mount Sinai

    phs001644

    BioMe

    Yes

    -

    NHLBI TOPMed: Childhood Asthma Management Program

    phs001726

    CAMP

    Yes

    -

    NHLBI TOPMed: Coronary Artery Risk Development in Young Adults

    phs001612

    CARDIA

    Yes

    -

    NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation

    phs001189

    CCAF

    -

    Yes

    NHLBI TOPMed: The Cleveland Family Study

    phs000954

    CFS

    -

    Yes

    NHLBI TOPMed: Cardiovascular Health Study

    phs001368

    CHS

    -

    -

    NHLBI TOPMed: Framingham Heart Study

    phs000974

    FHS

    -

    -

    NHLBI TOPMed: Genetic Study of Atherosclerosis Risk

    phs001218

    GeneSTAR

    -

    Yes

    NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate

    phs001359

    GOLDN

    -

    Yes

    NHLBI TOPMed: The Hispanic Community Health Study/Study of Latinos

    phs001395

    HCHS/SOL

    Yes

    -

    NHLBI TOPMed: The Heart and Vascular Health Study

    phs000993

    HVH

    -

    -

    NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy

    phs001293

    HyperGEN

    -

    Yes

    NHLBI TOPMed: The Jackson Heart Study

    phs000964

    JHS

    -

    -

    NHLBI TOPMed: NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study

    phs001598

    JHU_AF

    Yes

    -

    NHLBI TOPMed: The Multi-Ethnic Study of Atherosclerosis

    phs001416

    MESA

    -

    -

    NHLBI TOPMed: Plasma microRNAs are associated with atrial fibrillation and change after catheter ablation

    phs001434

    miRhythm

    Yes

    -

    NHLBI TOPMed: Partners HealthCare Biobank

    phs001024

    PARTNERS

    -

    Yes

    NHLBI TOPMed: Pulmonary Hypertension and the Hypoxic Response in Sickle Cell Disease

    phs001682

    PUSH_SCD

    Yes

    -

    NHLBI TOPMed: Recipient Epidemiology and Donor Evaluation Study-III Brazil Sickle Cell Disease Cohort

    phs001468

    REDS-III_Brazil_SCD

    Yes

    -

    NHLBI TOPMed: San Antonio Family Heart Study

    phs001215

    SAFHS

    -

    Yes

    NHLBI TOPMed: Severe Asthma Research Program

    phs001446

    SARP

    Yes

    -

    NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans

    phs000972

    SAS

    -

    -

    NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry

    phs000997

    VAFAR

    -

    Yes

    NHLBI TOPMed: Venous Thromboembolism project

    phs001402

    VTE

    -

    -

    NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry

    phs001032

    VU_AF

    -

    Yes

    NHLBI TOPMed: Walk-PHaSST Sickle Cell Disease

    phs001514

    Walk_PHaSST_SCD

    Yes

    -

    NHLBI TOPMed: Women's Health Initiative

    phs001237

    WHI

    -

    -

    AWS GPU instance typearrow-up-right
    Data Cruncher Interactive Analysesarrow-up-right
    blog postarrow-up-right
    automatically updates your workflowsarrow-up-right
    ORCID iDsarrow-up-right
    GA4GH Tool Registry Servicearrow-up-right
    Jupyter Notebookarrow-up-right
    R and Python Jupyter Notebooksarrow-up-right
    This tutorialarrow-up-right
    This guidearrow-up-right
    GWAS tutorialarrow-up-right
    featured workspacearrow-up-right
    video recording, slides, and exercisesarrow-up-right
    PIC-SURE technical documentationarrow-up-right
    This recent articlearrow-up-right
    Dockstore release notesarrow-up-right
    phs001194arrow-up-right
    NHLBI TOPMed - NHGRI CCDG: Penn Medicine BioBank Early Onset Atrial Fibrillation Studyarrow-up-right
    Hematopoietic Cell Transplant for Sickle Cell Disease (HCT for SCD)arrow-up-right
    Cooperative Study of Sickle Cell Disease (CSSCD)arrow-up-right
    Multicenter Study of Hydroxyurea (MSH)arrow-up-right
    Optimizing Primary Stroke Prevention in Children with Sickle Cell Anemia (STOP II)arrow-up-right
    NHLBI TOPMed: Genetics of Asthma in Latino Americans (GALA)arrow-up-right
    NHLBI TOPMed: Genetic Causes of Complex Pediatric Disorders - Asthma (GCPD-A)arrow-up-right
    NHLBI TOPMed: REDS-III Brazil Sickle Cell Disease Cohort (REDS-BSCDC)arrow-up-right