LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Introduction
  • Significant new features
  • New user support materials and documentation
  • Data Release
  • For detailed platform release notes please consult the following resources:

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Release Notes

2020-10-23 BioData Catalyst Ecosystem Release Notes

Previous2021-01-15 BioData Catalyst Ecosystem Release NotesNext2020-08-24 BioData Catalyst Ecosystem Release Notes

Last updated 4 years ago

Was this helpful?

Introduction

The 2020-10-23 release marks the third release for the NHLBI BioData Catalyst ecosystem. This release includes several new features along with documentation and tutorials (e.g., bringing your own data and tools) to help new users get started on the system. This release also includes enhanced support for querying annotations for TOPMed Freeze 8 variants in the Annotation Explorer, and querying combined phenotypic and genomic data in PIC-SURE. Please find more detail on the new features and user support materials in the sections below.

The 2020-10-23 data release includes the addition of both Parent and TOPMed studies. A total of 8 new Parent studies and their respective unharmonized clinical files were added. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 2 TOPMed studies new to BioData Catalyst. Additionally, 6 studies were updated to the latest version. These updates included new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The data is now available for access across the entire ecosystem. Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.

Significant new features

  • Form cohorts on Gen3 Exploration page and export to Seven Bridges workspace: Users can now export PFB (Portable Format for Bioinformatics) files from Gen3 (e.g., synthetic cohort files from multiple groups) to Seven Bridges.

  • CWL workflows for EPACTS and Plink association tests: Users can now find CWL workflows in the Seven Bridges Public Apps Gallery for the association test methods EPACTS and Plink. More information can be found in .

  • Query annotations for TOPMed Freeze 8 variants in the Annotation Explorer: Users on Seven Bridges can now use the Annotation Explorer to interactively aggregate and filter ~1 billion variants from TOPMed Freeze 8 using 450 annotations. Variant grouping files can be created from the results and exported to a workspace for use in rare variant association testing. Users with dbGaP approval for one or more TOPMed studies are able to access and work with the full Freeze 8 variant annotation database.

  • Query open access variant annotations in Annotation Explorer: Users on Seven Bridges without dbGaP approval for any TOPMed studies can now make use of the Annotation Explorer and interactively query TOPMed variants from Freeze 5 that have been released in dbSNP, a public-domain archive for human variants. Users can aggregate and filter ~550 million variants using ~260 annotations available in this dataset and generate variant grouping files for rare variant association testing.

  • Query combined phenotypic and genomic data in PIC-SURE: A release of genomic data in PIC-SURE now allows users to perform combined phenotypic and genomic queries to see phenotypic/genomic correlations. Users can export queries/cohorts to Seven Bridges or Terra Workspaces using the PIC-SURE API.

New user support materials and documentation

  • : This guide introduces users to the two Docker-based workflow languages used to run batch analyses in the ecosystem: the Workflow Description Language (WDL) in Terra and the Common Workflow Language (CWL) in Seven Bridges. The guide links to resources that lead users from the early steps of learning to wrap their current pipelines for use in the cloud to how to publish their work in our open access catalog Dockstore to share with the community. This guide was originally conceived in discussion with fellows during the BDCatalyst September Face-to-Face. Fellows developed content and provided feedback and are listed as contributors within the publication.

  • Benchmarking guide for GENESIS association test workflows: This guide provides users with comprehensive benchmarking information for the CWL versions of the GENESIS association workflows. This guide shows the computation costs and execution times for a variety of association tests using 2.5K samples, 10K samples, 36K samples, and 50K samples run on both AWS and Google Cloud. The benchmarking guide can be found on the page “” of the Seven Bridges documentation.

  • : We published a Jupyter notebook that provides functions for users to programmatically upload data to their Terra Google bucket and organize associated data into data tables for input into workflows. This may be a helpful resource for users that plan to upload many files. This notebook is part of a growing code library available in the .

  • Utilities Workflows on Dockstore: The BioData Catalyst Organization on Dockstore now has a with workflows for completing common tasks such as data import, genotype file processing, and quality control of whole genome or exome sequencing data. Fellow Kenny Westermann developed a workflow that fetches data from dbGaP for use in BDCatalyst.

Data Release

The table below highlights the new data release on BioData Catalyst which includes both Parent and TOPMed studies. A total of 8 new Parent studies and their respective unharmonized clinical files were added to the ecosystem. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 2 TOPMed studies new to BioData Catalyst. Additionally, 6 TOPMed studies previously hosted on BioData Catalyst were updated to the latest study versions. These updates included new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8 (previously hosted Freeze 5b only). For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The data is now available for access across the entire ecosystem.

Study Name

phs I.D. #

Acronym

New to BioData Catalyst

New study version

NHLBI GO-ESP: Lung Cohorts Exome Sequencing Project (Asthma)

phs000422

Asthma

Yes

CATHeterization GENetics (CATHGEN)

phs000703

CATHGEN

Yes

NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene)

phs000951

COPDGene

Yes

The Diabetes Heart Study (DHS)

phs001012

DHS

Yes

Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)

phs001252

ECLIPSE

Yes

NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)

phs001472

ECLIPSE

Yes

NHLBI TOPMed: Boston Early-Onset COPD Study in the TOPMed Program

phs000946

EOCOPD

Yes

NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)

phs000920

GALAII

Yes

NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy (GENOA)

phs001345

GENOA

Yes

NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity (GenSalt)

phs001217

GenSalt

Yes

Hispanic Community Health Study /Study of Latinos (HCHS/SOL)

phs000810

HCHS-SOL

Yes

Pediatric Cardiac Genomics Consortium (PCGC) Study

phs001194

PCGC

Yes

NHLBI TOPMed: PCGC's Congenital Heart Disease Biobank

phs001735

PCGC_CHD

Yes

PGRN-RIKEN: Rate Control Therapy in Patients with Atrial Fibrillation

phs000439

PGRN-RIKEN_AF

Yes

NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment (SAGE)

phs000921

SAGE

Yes

SNP Health Association Resource (SHARe) Asthma Resource Project (SHARP)

phs000166

SHARP

Yes

For detailed platform release notes please consult the following resources:

  • Gen3 release notes

  • PIC-SURE release notes

Data page
this blog post
Bring Your Own Tools to BioData Catalyst
GWAS with GENESIS
Bring Your Own Data to Terra Tutorial
BioData Catalyst Collection workspace
Utilities collection
Terra release notes
Seven Bridges release notes
Dockstore release notes