LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Introduction
  • Significant new features
  • Known issues and workarounds
  • New user support materials and documentation
  • Data Releases
  • Planned upcoming Data Releases
  • For detailed platform release notes please consult the following resources:

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Release Notes

2021-07-09 BioData Catalyst Ecosystem Release Notes

Previous2021-10-04 BioData Catalyst Ecosystem Release NotesNext2021-04-02 BioData Catalyst Ecosystem Release Notes

Last updated 3 years ago

Was this helpful?

Introduction

The 2021-07-09 release marks the sixth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., SAS on Seven Bridges and Galaxy’s integration with Terra) along with documentation and tutorials to help new users get started on the system (e.g., PIC-SURE Open Access). This release also includes enhanced support for maintaining and versioning CWL on external tool repositories. Please find more details on the new features and user support materials in the sections below.

The 2021-07-09 data release includes the addition of CRAMs and unharmonized clinical files for the parent project CARDIA and 3 other TOPMed programs. The TOPMed program REDS-III received a version update. The unharmonized clinical files were uploaded for the 5 BioLINCC projects and the 2 open tutorial projects. Please refer to the Data Release section below for more information as well as the page on the BioData Catalyst website.

Significant new features

Authentication through the NIH Researcher Authentication Service: The BioData Catalyst ecosystem updated the authentication mechanism to use the NIH Researcher Authentication Service. Researchers will now be redirected to the NIH RAS page to enter their eRA Commons credentials when logging into one of the platforms within the ecosystem.

New Jupyter notebook published in the BioData Catalyst Collection Featured Workspace on Terra.

Galaxy has integrated with Terra: Galaxy is now available through the other “faces” of Terra, including the . You can launch your very own Galaxy server without having to do any configuration yourself, right from the Terra web interface. This marks a transition from alpha to beta development status of Galaxy on Terra, meaning that the software is more mature and considered reliable enough for regular work, with the caveat that minor changes may occur over time as we smooth out any remaining rough edges and improve user experience in the application. Learn more about how to use Galaxy in Terra . Features of Galaxy and its use within Terra are also featured in our blog post . You can also import Dockstore workflows into Galaxy when it's launched in Terra. Speaking of workflows, Cromwell 64 is now live on Terra.

: The RStudio image now has , a tool for single-cell transcriptomics, as well as , a package for verifying the integrity of an object in Google Cloud Storage.

Interactive Analysis: Jupyter Notebook images have been updated with Bioconductor 3.13.0. See the Bioconductor release notes .

SAS: Users on Seven Bridges can now launch SAS for interactive analysis from the Data Cruncher feature. All project files are available within SAS. Users can select from three SAS offerings built on top of SAS Studio: 1) SAS Business Intelligence enables users to utilize SAS code to manage data, create, modify and compare descriptive and predictive models. Capabilities include clustering, decision trees, linear and logistic regression. 2) SAS Analytics adds the power of SAS Viya’s Data Mining and Machine Learning algorithms such as neural networks, gradient boosting, and random forest. 3) SAS Data Science provides access to text analysis, time series models, advanced forecasting and model governance.

LocusZoom Interactive Application: Users on Seven Bridges can now launch an R Shiny application that enables users to select, visualize and interactively explore single variant association test results data, with no prior R programming knowledge. Researchers can explore existing analyses available in the database, generate LocusZoom plots for example data, or provide their own association .RData files. The app also provides the JSONizer tool, which enables researchers to subset their association test results (.RData) files and to convert them into the appropriate JSON files required by LocusZoom. Users can launch the application from the in the top navigation bar.

Example notebook for data import with DRS: These example notebooks on Seven Bridges provide users with the code and steps for importing data from CAVATICA (Kids First data) as well as importing GTEx data from the NHGRI AnVIL system. The import utilizes the DRS functionality to access files that are stored on other NIH cloud systems. Users can find notebooks from the in the top navigation bar.

CWL v1.2 available: BioData Catalyst Powered by Seven Bridges now supports Common Workflow Language (CWL) version 1.2. The new version of CWL brings a major new functionality - , as well as several minor features and improvements. For the detailed change log please see the and the .

New CWL tools and workflows on BioData Catalyst Powered by Seven Bridges: Users can find all these tools and more in the :

  • - This is a tool for whole genome regression analysis.

  • - This UW-GAC tool is a standalone app for creating Manhattan and QQ plots from the GENESIS association test results with additional filtering and stratification options available.

  • - This is a scalable SNV and INDEL annotation pipeline, performing a spectrum of annotations in a single tool. It integrates annotations from dozens of databases and annotation tools.

  • - This updates the null model file obtained with the GENESIS Null model workflow so that it can be used in the GENESIS Single Variant Association Testing workflow in fast score mode.

  • - This UW-GAC tool was created for QC in GWAS. The tool calculates missing rate by sample. A subset of variants may be specified.

  • - This UW-GAC tool was created for QC in GAWS. The tool calculates missing rate by variant. A subset of samples and/or variants may be specified.

  • - This UW-GAC tool was created for QC in GAWS. The tool calculates allele frequency and counts. Values for both the alternate allele (count, frequency) and the minor allele (MAC, MAF) are returned. A subset of samples and/or variants may be specified.

  • - This UW-GAC tool calculates the LD among an index variant and each variant in a set of other variants stored in a GDS file using the snpgdsLDMat function in the and a wrapper .

  • - This UW-GAC tool calculates the LD between a pair of variants stored in a GDS file using the snpgdsLDMat function in the and a wrapper .

  • - This UW-GAC tool calculates the LD between all pairs in a user-specified set of variants stored in a GDS file using the snpgdsLDMat function in the and a wrapper .

PIC-SURE Data Access Dashboard Updates: PIC-SURE’s Data Access Dashboard has been updated to include the number of studies and participants the user has access to based on their authorization.

  • Example showing the users how to access lipid measurements across harmonized variables and multiple visits using the PIC-SURE API in R, RStudio, and Python.

  • All previous notebooks examples are now available in RStudio on Seven Bridges.

Known issues and workarounds

BioLINCC Phase 2 data dictionaries: These data dictionaries were submitted in PDF format which required additional intervention and delayed general release to the platform. These data dictionaries will be released as soon as is feasible for use across the platform.

New user support materials and documentation

  • Introduction to PIC-SURE

  • Introduction to PIC-SURE Open Access: Harmonized

  • Introduction to PIC-SURE Open Access: One Criterion Search

  • Introduction to PIC-SURE Open Access

  • Introduction to PIC-SURE Open Access: Multiple search criteria

  • Introduction to PIC-SURE Authorized Access

  • Introduction to PIC-SURE Authorized Access: Data Export

Data Releases

The table below highligts which studies were included in the 2021-07-09 data release. CRAMs and unharmonized clinical files were uploaded for the parent project CARDIA and 3 other TOPMed programs. The TOPMed program REDS-III received a version update. The unharmonized clinical files were uploaded for the 5 BioLINCC projects and the 2 open tutorial projects. The data is now available for access across the entire ecosystem.

Study Name

phs I.D. #

Acronym

New to BioData Catalyst

New study version

Treatment of Pulmonary Hypertension and Sickle Cell Disease with Sildenafil Therapy

phs002383

WalkPHaSST

true

1

CARDIA Cohort

phs000285

CARDIA

false

3

phs001601

CCDG-PMBB

true

1

phs002385

CIBMTR

true

1

phs002362

CSSCD

true

1

phs002348

MSH

true

1

phs002386

STOPII

true

1

phs001542

GALA

true

1

phs001661

GCPD-A

true

2

phs001468

REDS-III

false

2

Tutorial-biolincc_camp

open

true

tutorial-biolincc_framingham

open

true

Planned upcoming Data Releases

Study Name

phs I.D. #

Acronym

New to BioData Catalyst

New study version

Combined Exchange Area new data

false

BioLINCC – Training Dataset – Digitalis

BioLINCC – BabyHug

phs002415

true

For detailed platform release notes please consult the following resources:

  • Gen3 release notes

  • PIC-SURE release notes

New PIC-SURE Open Access: is now available in BioData Catalyst! PIC-SURE Open Access is available to users who have an eRA Commons account, including those who are not authorized to access any studies. The Open Access feature allows users to explore de-stigmatized, phenotypic data available in PIC-SURE prior to requesting access to data. For more information check out the and .

New PIC-SURE are available as public projects in Seven Bridges and Terra as follows:

New on Dockstore under the BioData Catalyst organization: This collection includes two WDL workflows to help users prepare their data for association testing: one for converting VCF files to GDS and one for linkage disequilibrium pruning. Stay tuned as more workflows are released.

New on Dockstore under the BioData Catalyst organization: The WDL workflows in this collection enable scalable, efficient, and flexible genome-wide gene-environment interaction analysis. GEM conducts single-variant analysis for common variants (currently in unrelated individuals only) and MAGEE conducts single-variant and variant set-based analysis for common or rare variants while allowing for relatedness. The collection also includes examples of cloud costs in the README.

Maintaining and Versioning CWL on External Tool Repositories: presents best practices for writing and maintaining CWL tools/workflows in an external tool repository, such as GitHub, so that users can better manage versions of their tools. Users should follow these best practices if they would like to publish and share their CWL tools and workflows in the since Dockstore has the ability to automatically pull changes from GitHub. These best practices will ensure that the CWL is fully portable and can run successfully not only on Seven Bridges Platforms, but also on other CWL executors such as cwltool and Toil.

Transferring Files Between Seven Bridges and Terra: guides users through the process of transferring files between the two workspace environments Seven Bridges and Terra.

Accessing Egress-Free GTEx Data From AnVIL: A new data interoperability page that includes linked instructions for how to access egress-free GTEx data from NHGRI’s AnVIL cloud ecosystem is .

PIC-SURE Documentation Updates: New provides new information on the Data Access Dashboard, PIC-SURE Open Access, and a new table for understanding study-specific subject identifiers.

PIC-SURE Video Tutorials: are now available for the following topics:

Published a on the role of a secure cloud ecosystem for supporting infrastructure projects and creating connected communities, highlighting BioData Catalyst as one of several NIH-commissioned infrastructure development projects that involve not just putting data on the cloud but also building the additional layers of services that are necessary to deliver on the extraordinary promise of this new model for data sharing and analysis.

Data
Sample and Variant quality control methods with Hail
NHLBI BioData Catalyst
here
here
Seurat package now included by default in R-based cloud environments
Seurat
crcmod
here
University of Michigan
Public project “LocusZoom Shiny App”
Public Project “Data Interoperability”
conditional execution of workflow steps
CWL CommandLineTool specification
CWL Workflow specification
Public Apps Gallery
Regenie 2.0.1
GENESIS Association results plotting
WGSA 0.9
GENESIS Update Null Model for Fast Score Test
Missing rate by sample
Missing rate by variant
Allele frequency
Id-index
SNPRelate R package
LDcompute R package
Id-pair
SNPRelate R package
LDcompute R package
Id-set
SNPRelate R package
LDcompute R package
PIC-SURE Open Access
user guide
tutorial
Jupyter notebook examples
UWGAC Ancestry and Relatedness analysis collection
Large-scale Gene by Environment collection
This tutorial
Dockstore repository
This tutorial
here
PIC-SURE documentation
PIC-SURE Video tutorials
blog post
Terra release notes
Seven Bridges release notes
Dockstore release notes
NHLBI TOPMed - NHGRI CCDG: Penn Medicine BioBank Early Onset Atrial Fibrillation Study
Hematopoietic Cell Transplant for Sickle Cell Disease (HCT for SCD)
Cooperative Study of Sickle Cell Disease (CSSCD)
Multicenter Study of Hydroxyurea (MSH)
Optimizing Primary Stroke Prevention in Children with Sickle Cell Anemia (STOP II)
NHLBI TOPMed: Genetics of Asthma in Latino Americans (GALA)
NHLBI TOPMed: Genetic Causes of Complex Pediatric Disorders - Asthma (GCPD-A)
NHLBI TOPMed: REDS-III Brazil Sickle Cell Disease Cohort (REDS-BSCDC)