LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Introduction
  • Significant new features
  • New user support materials and documentation
  • Data Releases
  • Planned upcoming Data Releases
  • For detailed platform release notes please consult the following resources:

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Release Notes

2021-10-04 BioData Catalyst Ecosystem Release Notes

Previous2022-01-24 BioData Catalyst Ecosystem Release NotesNext2021-07-09 BioData Catalyst Ecosystem Release Notes

Last updated 3 years ago

Was this helpful?

Introduction

The 2021-10-04 release marks the seventh release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., project cost reporting on Terra and archiving files on AWS) along with documentation and tutorials (e.g., estimating and managing cloud costs) to help new users get started on the system. This release also includes enhanced support for semantic search and R Shiny apps. Please find more detail on the new features and user support materials in the sections below.

The 2021-10-04 data release includes the addition of the final BioLINCC training dataset plus another BioLINCC study, BabyHug. The TOPMed Combined Exchange Area buckets were updated with more datasets from multiple new freezes. The last dataset ingested was PCGC’s CMG. Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.

Significant new features

Updated Semantic Search UI: Dug, the BioData Catalyst's Semantic Search, has an updated user interface. The new interface makes it easy to see more results on one page. A zoom feature lets users expand individual results to explore in greater detail. Provenance in knowledge graphs and links to published literature are presented where available.

Archive files on AWS: Users on BioData Catalyst Powered by Seven Bridges can now select files to move from AWS S3 storage to AWS Glacier (archival storage). Moving files to archival storage can result in an ~80% cost reduction. It’s recommended that users move files to archival storage if the files will not be used for three or more months.

Project Per Work Space Cost Reporting on Terra: Users on BioData Catalyst Powered by Terra will now have more transparency and access to cost information with the . This update associates each Terra workspace with its own Google Project, created by Terra on behalf of users when workspaces are created. Switching to this “project-per-workspace” model enables added functionality for displaying a breakdown of costs per workspace in the Terra user interface, and allows Terra users to to be notified of cloud spending. This change will only apply to new workspaces created, with plans to migrate existing workspaces over to this model in the future.

Try out R Shiny apps in Terra: Since the rollout of last quarter, Terra’s Interactive Analysis team has expanded the capabilities of the cloud environments framework that supports running RStudio, Jupyter Notebook and in Terra. Most recently, Terra users now have the ability to . Check out an example of an developed by the Manning Lab to visualize whole-genome association data.

Save data from an IA environment: With the new R Shiny apps in Terra, users can . Saving data from an interactive cloud environment (such as an instance of RStudio or a Jupyter notebook) is a useful trick in some situations. Users worried about losing work done in an interactive environment because they need to delete or modify the persistent disk can use "gsutil" to copy it to the workspace bucket.

Speed up machine learning work with GPUs on Terra: Terra’s Interactive Analysis team has released an upgrade that enables . Terra already offered the , and are now responding to user requests to run GPU-enabled computations interactively with GPU support for Jupyter Notebooks.

Speed up workflows and save costs using N2 instances sporting Intel’s 2nd Generation Xeon CPUs on Terra: Terra users will now have the option to use new-generation N2 instances, which have demonstrated faster performance and reduced cost. Read more about these updates and how to request N2 instances for workflows .

New user support materials and documentation

Cross-study harmonization example notebook: will demonstrate how to query and work with the BioData Catalyst studies, particularly cross-study harmonization using the PIC-SURE API.

Estimate and Manage Cloud Costs on Seven Bridges: describes how to estimate costs associated with using Seven Bridges. The tutorial includes an overview of both cloud storage costs and cloud computation costs and the primary drivers of those costs. The tutorial also provides guidance on how to approach estimating cloud storage and computation costs so that researchers can budget for cloud costs in their grants, request cloud credits, and plan their work on BioData Catalyst.

Public project for TOPMed Freeze8 variant calling pipelines: Users on Seven Bridges can now access a public project that walks through how to use the CWL tools and workflows that were used to perform variant calling of TOPMed Freeze8. The public project provides explanations of the purpose of all of the tools and workflows and how they are used together, along with examples of completed analyses. All of the CWL tools and workflows in the project are available in the Public Apps Gallery.

Data Releases

The table below highlights which studies were included in the 2021-10-04 data release. The final BioLINCC training dataset was uploaded, plus another BioLINCC study, BabyHug. The ORCHID dataset was re-ingested after the data owners found they had provided incorrect versions of the files at the time of initial ingestion. The TOPMed Combined Exchange Area buckets were updated with more datasets from multiple new freezes. The last dataset ingested was PCGC’s CMG. The data is now available for access across the entire ecosystem.

Study Name

phs I.D. #

Acronym

New to BioData Catalyst

New study version

BioLINCC (Phase 1) - Training Data (Digitalis)

open

true

NA

Additional TOPMed combined EA

c999

Freeze1/

Freeze9b/

Freeze10a

true

NA

PETAL - ORCHID (data re-ingested since files initially provided by data submitters were not the final version )

phs002299

ORCHID

false

1

PCGC (CMG/Wagner)

CMG

true

1

CureSCi - BabyHug (via BioLINCC)

phs002415

BabyHug

true

1

Planned upcoming Data Releases

Study Name

phs I.D. #

Acronym

New to BioData Catalyst

New study version

TOPMed Freeze 9 - Batch 1

(22 datasets included)

Various

Various

false

NA

PCGC SRA Data

Additional TOPMed Freeze 8 Studies (CATHGen)

phs000571

true

6

For detailed platform release notes please consult the following resources:

  • Gen3 release notes

  • PIC-SURE release notes

Need an easy way to explain Terra to your colleagues or collaborators? Try this

Estimate Workflow Costs on Terra: Terra users can also follow . This is the original document describing the steps summarized in this blog post.

Understanding and controlling cloud costs on Terra: includes a detailed breakdown of the types of costs that you may incur when working on Google Cloud, plus some advice on how to reduce costs.

Understanding costs and billing on Terra: includes an overview of how billing works, including how billing accounts, projects and workspaces relate to each other, and the difference between workspace permissions and billing permissions.

Controlling cloud costs on Terra – sample use cases: includes a selection of typical analysis use cases, for which the costs are broken down in several scenarios in order to illustrate the effect of cost control strategies.

New tools and workflows released to :

Three additional WDL workflows have been released in the , including KING, PC-Relate, and PC-AIR.

WDL was released to the Utilities collection. This workflow provides the full power of to subset, subsample, and filter VCF files.

New with CWL workflows can predict gene expression (or whatever biology the models predict) in a cohort with available genotypes and run associations to a trait measured in the cohort.

, including Terra:

New to Galaxy? The Galaxy Training Network is continuing to add training material in their on Dockstore.

Additionally, users can explore some of the Galaxy community’s best practices workflows in their on Dockstore.

Ready to publish and share the tool or workflow you developed with the research community? Dockstore users can link their accounts to their ORCID and Zenodo accounts, , and now can .

New video tutorials demonstrate exporting data from PIC-SURE to and sing BioLINCC/Sickle Cell related data.

Data page
rollout of PPWS
set up and use GCP budget alerts
Rstudio and Bioconductor
Galaxy
launch R Shiny apps from Terra’s built-in RStudio environment
open-source R Shiny app
save data from an IA environment
adding Graphical Processor Units (GPUs) to Notebook cloud environments in Terra
ability to use GPUs in workflows
here
This tutorial notebook
This tutorial
quick (2-min.) overview of Terra.
this documentation to estimate costs of workflows
This article
This article
This article
Dockstore’s NHLBI BioData Catalyst Organization
UWGAC Ancestry, Relatedness, and Association Testing Collection
xvcfView
bcftools view
PrediXcan collection
Launch Galaxy workflows from Dockstore into multiple Galaxy instances
Organization
IWC Organization
mint DOIs for their workflows hosted on Dockstore
export their workflows directly to their ORCID profile
Terra
Seven Bridges u
Terra release notes
Seven Bridges release notes
Dockstore release notes
phs001194