LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Introduction
  • Significant new features
  • New user support materials and documentation
  • Data release

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Release Notes

2020-08-24 BioData Catalyst Ecosystem Release Notes

Previous2020-10-23 BioData Catalyst Ecosystem Release NotesNext2020-04-02 BioData Catalyst Ecosystem Release Notes

Last updated 4 years ago

Was this helpful?

Introduction

The 2020-08-24 release marks the second release for the NHLBI BioData Catalyst ecosystem. This release includes several new features along with documentation and tutorials (e.g. genome-wide association studies) to help new users get started on the system. This release also includes enhanced support for machine learning in the workspace environments and support for GA4GH industry standard in Dockstore for workflows. Please find more detail on the new features and user support materials in the sections below.

The 2020-08-24 data release includes the addition of TOPMed Freeze 8 data for a subset of studies on BioData Catalyst. Freeze8 multi-sample VCFs are available for 29 studies, of which 10 studies are new to the ecosystem. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format, in contrast to the Freeze5 multi-sample VCFs which are hosted as tar bundles. For the 10 studies new to BioData Catalyst, CRAM files and unharmonized clinical files are also available for access. The data release further includes updates of many studies to the latest versions that are available on dbGaP. The next data release will include Freeze8 multi-sample VCFs for additional TOPMed studies in addition to unharmonized clinical data and CRAM files for studies that are not yet hosted on the system. Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.

Significant new features

  • GENESIS tutorial and public project: Seven Bridges has made a public project available that introduces users to the GENESIS R package and related R packages (SeqArray, SeqVarTools, and SNPRelate) used in mixed model association testing in sequence data. The examples in the project help users understand the code that is used in the GENESIS public apps (), prepare data for input to those apps, and interact with the results. The “GENESIS Tutorial” public project can be found in the list of Seven Bridges public projects on the top navigation bar of the platform.

  • Launch machine learning packages in Jupyterlab Notebooks: Users on Seven Bridges can now use a docker image with that support machine learning analyses when working in Jupyterlab Notebooks. This docker image can be found in the Data Cruncher feature: Select “Create new analysis” and then, under the Environment setup menu, select “SB Machine Learning - TensorFlow 2.0, Python 3.7.”

  • Support for larger GPU instances: A larger is now available for researchers working in Jupyterlab Notebooks and RStudio on Seven Bridges. The p3dn.24xlarge instance has 1800GB SSD, 96vCPUs, 768GB RAM, and 8 GPUs. These higher memory cards enable machine learning training on large 3D images and high-performance computing applications.

  • Data Cruncher Interactive Analyses: Seven Bridges now features a “” public project, found in the list of public projects on the top navigation bar of the platform, with example analyses to help users interpret results from secondary analysis. The project has eight separate analyses - three in RStudio and five in Jupyterlab Notebooks - including one on VCF visualization and one on structural variant analysis. Read the .

  • Launch Dockstore workflows in Seven Bridges: Users can now find CWL workflows in Dockstore and launch them in the Seven Bridges workspace environment.

  • Export large PFBs from Gen3 to Terra: Users can now export large PFB (Portable Format for Bioinformatics) files from Gen3 (e.g. synthetic cohort files from multiple groups) to Terra. New backend systems now automatically parse files more efficiently.

  • Automatic syncing with GitHub apps: Dockstore now with any changes you make to your linked GitHub repository.

  • Link ORCID iDs to published workflows: Users can now link their to their Dockstore accounts, and make iDs visible via their organizations, and in workflows and tools they have starred. Users searching Dockstore’s catalog will be able to associate workflows you contribute with scientific publications.

  • GA4GH TRS Support: Dockstore now implements the (TRS) v2 standard. The goal of the TRS API is to provide a standardized way to describe the availability of tools and workflows.

  • Transfer datasets to Jupyter Notebooks with Query Id: Users that query the PIC-SURE UI and apply filters to create datasets can submit their query ID to the PIC-SURE client library via an R or Python and do not need to re-build the query manually.

  • Data Tree Optimizations: The PIC-SURE data tree has been optimized to show users only the studies they have been authorized to see and rendered more efficiently to allow users to select studies faster.

  • Export data dictionary of clinical variables: are now available that provide directions on exporting the full data dictionary of all clinical variables to a CSV via PIC-SURE.

New user support materials and documentation

Data release

The table below highlights which TOPMed studies were included in the 2020-08-24 data release. Freeze 8 multi-sample VCFs were added for the 29 studies listed in the table below. This includes 19 studies which were previously hosted on BioData Catalyst with Freeze 5b data available and 10 studies which are new to BioData Catalyst. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. For the 10 studies which are new to BioData Catalyst, CRAM files and unharmonized clinical files are also available for access. Additionally, 10 of these studies were updated to the latest version. The data is now available for access across the entire ecosystem.

Study Name

phs I.D. #

Acronym

New to BioData Catalyst

New study version

NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish

phs000956

Amish

-

Yes

NHLBI TOPMed: Atherosclerosis Risk in Communities

phs001211

ARIC

-

-

NHLBI TOPMed: NHGRI CCDG: The BioMe Biobank at Mount Sinai

phs001644

BioMe

Yes

-

NHLBI TOPMed: Childhood Asthma Management Program

phs001726

CAMP

Yes

-

NHLBI TOPMed: Coronary Artery Risk Development in Young Adults

phs001612

CARDIA

Yes

-

NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation

phs001189

CCAF

-

Yes

NHLBI TOPMed: The Cleveland Family Study

phs000954

CFS

-

Yes

NHLBI TOPMed: Cardiovascular Health Study

phs001368

CHS

-

-

NHLBI TOPMed: Framingham Heart Study

phs000974

FHS

-

-

NHLBI TOPMed: Genetic Study of Atherosclerosis Risk

phs001218

GeneSTAR

-

Yes

NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate

phs001359

GOLDN

-

Yes

NHLBI TOPMed: The Hispanic Community Health Study/Study of Latinos

phs001395

HCHS/SOL

Yes

-

NHLBI TOPMed: The Heart and Vascular Health Study

phs000993

HVH

-

-

NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy

phs001293

HyperGEN

-

Yes

NHLBI TOPMed: The Jackson Heart Study

phs000964

JHS

-

-

NHLBI TOPMed: NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study

phs001598

JHU_AF

Yes

-

NHLBI TOPMed: The Multi-Ethnic Study of Atherosclerosis

phs001416

MESA

-

-

NHLBI TOPMed: Plasma microRNAs are associated with atrial fibrillation and change after catheter ablation

phs001434

miRhythm

Yes

-

NHLBI TOPMed: Partners HealthCare Biobank

phs001024

PARTNERS

-

Yes

NHLBI TOPMed: Pulmonary Hypertension and the Hypoxic Response in Sickle Cell Disease

phs001682

PUSH_SCD

Yes

-

NHLBI TOPMed: Recipient Epidemiology and Donor Evaluation Study-III Brazil Sickle Cell Disease Cohort

phs001468

REDS-III_Brazil_SCD

Yes

-

NHLBI TOPMed: San Antonio Family Heart Study

phs001215

SAFHS

-

Yes

NHLBI TOPMed: Severe Asthma Research Program

phs001446

SARP

Yes

-

NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans

phs000972

SAS

-

-

NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry

phs000997

VAFAR

-

Yes

NHLBI TOPMed: Venous Thromboembolism project

phs001402

VTE

-

-

NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry

phs001032

VU_AF

-

Yes

NHLBI TOPMed: Walk-PHaSST Sickle Cell Disease

phs001514

Walk_PHaSST_SCD

Yes

-

NHLBI TOPMed: Women's Health Initiative

phs001237

WHI

-

-

For detailed platform release notes please consult the following resources:

  • Gen3 release notes

  • PIC-SURE release notes

Overview of the ecosystem: This collaboratively developed guides new users through the process of understanding what the BioData Catalyst is to getting started using the ecosystem.

Tips for reliable and efficient analysis set-up: provides recommendations on how to set up your initial set of analyses, tips for running tools/workflows, and specifications for computational resources on Seven Bridges.

Genetic Association Testing Using GENESIS Workflows: guides users through the steps of running a single variant or multiple variant association test on Seven Bridges using the GENESIS R package pipelines.

Troubleshooting Tasks: presents some of the most common errors in task execution on Seven Bridges and shows you how to debug and resolve them.

GWAS tutorial and example cloud costs: Terra’s walks users through the steps of preparing data for input using Hail in Jupyter notebooks and running association tests as workflows with the GENESIS R package and provides example cloud costs derived from the tutorial.

Code Library: This release includes a Terra containing R and Python Jupyter Notebooks that cover how to use the Integrated Genomics Viewer with data from Gen3, workflows for merging VCF files, and expanded features for interacting with data using the Data Repository Service (DRS) such as bulk downloads.

Dockstore Fundamentals: A are available from the recent workshop Dockstore Fundamentals: Introduction to Docker and Descriptors for Reproducible Analysis.

API and User Interface Technical Documentation: provides users with information about the PIC-SURE API and user interface and examples of how to load data into the PIC-SURE High Performance Data Store.

Scalability and cost-effectiveness analysis of whole genome-wide association studies in the Cloud: from PIC-SURE provides a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies and helps users understand the trade-off between speed and cost.

Data page
available on GitHub
pre-installed libraries
AWS GPU instance type
Data Cruncher Interactive Analyses
blog post
automatically updates your workflows
ORCID iDs
GA4GH Tool Registry Service
Jupyter Notebook
R and Python Jupyter Notebooks
overview document
This guide
This tutorial
This guide
GWAS tutorial
featured workspace
video recording, slides, and exercises
PIC-SURE technical documentation
This recent article
Terra release notes
Seven Bridges release notes
Dockstore release notes