LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Introduction
  • Significant new features
  • Known issues and workarounds
  • New user support materials and documentation
  • Data Releases
  • Planned Upcoming Data Releases
  • For detailed platform release notes please consult the following resources:

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Release Notes

2023-01-09 BioData Catalyst Ecosystem Release Notes

Previous2023-04-04 BioData Catalyst Ecosystem Release NotesNext2022-10-03 BioData Catalyst Ecosystem Release Notes

Last updated 2 years ago

Was this helpful?

Introduction

The 2023-01-09 release marks the twelfth release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., Azure volumes now available on both main analysis platforms) along with documentation and tutorials (e.g., information on how variable tags are generated) to help new users get started on the system. This release also includes enhanced support for moving data seamlessly across platforms. Please find more detail on the new features and user support materials in the sections below.

The 2023-01-09 data releases include the addition of the Pediatric Cardiac Genomics Consortium (PCGC). Please refer to the Data Releases section below for more information as well as the page on the BDC website.

Significant new features

Azure volumes are now available on BDC Powered by Seven Bridges: Users can now link a Microsoft Azure bucket to their Seven Bridges workspaces. After logging in, go to Data > Volumes and select “Microsoft Azure” to be led through a bucket-linking wizard.

DRS Manifest Export: In order to further improve interoperability and allow users to move their data in a seamless way across platforms, the DRS export option on the Seven Bridges’ platforms is now available. With the new functionality, users can generate links to platform files (DRS URIs) and metadata into a manifest file, which can then be used for importing the files and metadata on other platforms.

OmicsCircos R Shiny app now available on BDC-Seven Bridges: OmicCircos app is a R Shiny application created around the OmicCircos R package for more effective generation of high-quality circular plots for visualizing genomic data. Common use cases include mutation patterns, copy number variations (CNVs), expression patterns, and methylation patterns. Such variations can be displayed as scatterplot, line, or text-label figures.

Introduction to SAS Public Project on BDC-Seven Bridges: Seven Bridges released a Public Project to train users on how to use SAS. The public project contains three notebooks that walk a user through: 1) loading and cleaning data in SAS using ICD9 codes, 2) pulling the CDC’s Social Vulnerability Index data via API and running a regression, and 3) loading hosted 1000 Genomes data into SAS and visualizing mutation information. A user can copy the public project to their own workspace and modify the tutorial notebooks to suit their needs.

New CWL Tools/Workflows on BDC-Seven Bridges:

  • BEDTools 2.30.0 toolkit:

    • BEDTools Coverage - returns the depth and breadth of coverage of features from B on the intervals in A

    • BEDTools Genomecov - computes histograms of feature coverage for a given genome

    • BEDTools GetFasta - extracts sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file

    • BEDTools Intersect - screens for overlaps between two sets of genomic features

    • BEDTools Merge - combines overlapping or “book-ended” features in an interval file into a single feature

    • BEDTools Sort - sorts a feature file by chromosome and other criteria

  • FlowSOM 2.4.0 which presents an algorithm used to distinguish cell populations from both flow and mass cytometry data in an unsupervised way.

  • cytofkit2 0.99.80 which is designed to analyze mass cytometry data from FCS files. It includes preprocessing, cell subset detection, cell subset visualization and interpretation, and inference of subset progression.

  • flowAI 1.24.0 which performs quality control on FCS data acquired using flow cytometry instruments. By evaluating three different properties: flow rate, signal acquisition, dynamic range, and quality control, it enables the detection and removal of anomalies.

  • CNVkit 0.9.9 toolkit for inferring and visualizing copy number from high-throughput DNA sequencing data.

  • SBG Single-Cell RNA Deep Learning - Training is a single cell classifier pipeline for human data. It relies on the transfer learning approach, which uses pre-trained gene embeddings as the starting point for building a model adjusted to given single-cell datasets.

  • SBG Single-Cell RNA Deep Learning - Predict is a single-cell classifier pipeline for human data. This tool uses the deep learning model generated by the SBG Single-Cell RNA Deep Learning - Training workflow to classify the input dataset.

Azure is now available on BDC Powered by Terra: Users can now log into Terra with a Microsoft Azure Cloud account. This is an invite-only version of Terra on the Azure platform. The public offering of Terra on Azure is expected in early 2023.

A new spend report is now available for BDC-Terra billing projects: The report identifies which workspaces are costing the most, to provide more transparency around cloud costs incurred in Terra. To access the spend report, go to your billing project (main menu > billing > billing project) and click on the "Spend report" tab.

New streamlined user journey from BDC Powered by PIC-SURE to analysis platforms: PIC-SURE has added “Export to Seven Bridges” and “Export to Terra” buttons to streamline data export into a BioData Catalyst analysis workspace. After exploring and filtering variables in PIC-SURE Authorized Access, users can package their data with the Select and Package Data Tool. Once the data is packaged, users can select their preferred BDC analysis platform with the new Export buttons. This provides all information needed and points the user directly to the public PIC-SURE project on either Seven Bridges or Terra.

Take a Tour of BDC-PIC-SURE: PIC-SURE has updated the guided tour of the interface to interactively display search results based on the user’s authorization. This guided tour walks through the different parts of the platform, including how to use tags, where search results are displayed, and how to interpret the Results Panel.

Known issues and workarounds

BABYHUG Data Field Issue: The study BABYHUG, phs002415, contained a data file that included SAS-derived new line characters in data fields. As provided by the data submitter this caused shifts in the data rows, leading to fields being incorrectly mapped to the wrong variable. A new corrected version of the file has been requested from the data submitter.

New user support materials and documentation

BDC GitBook on BDC-PIC-SURE: Users can now access the BDC GitBook documentation directly from the PIC-SURE platform under the “Help” tab.

Data Releases

The table below highlights which studies were included in the 2023-01-09 data release.

The PCGC substudy contains whole exome sequences, targeted sequences, and SNP array data. It is a multi-center, observational cohort study of individuals with congenital heart defects. The study aims to investigate the relationship between genetic factors and phenotypic and clinical outcomes in patients with CHD. Summary level phenotypes for the study participants can be viewed on the top-level study page. Individual level data and molecular data for the study are available by requesting Authorized Access. The study has collected phenotypic data and source DNA from 10,000 probands, parents, and families of interest. The data is now available for access across the entire ecosystem.

Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version

The Pediatric Cardiac Genomics Consortium (PCGC)

phs000571.v6.p2.c1

PCGC-CHD-GENES_HMB

No

Yes

Planned Upcoming Data Releases

Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version

The Collaborative Cohort of Cohorts for COVID-19 Research (C4R)

phs002988.v1.p1.c1

phs002910.v1.p1.c1

phs002910.v1.p1.c2

phs002911.v1.p1.c1

phs002911.v1.p1.c2

phs003017.v1.p1.c1

phs002919.v1.p1.c1

C4R_ARIC_phs002988

C4R_COPDGene_phs002910

C4R_FHS_phs002911

C4R_MESA_phs003017

C4R_REGARDS_phs002919

No

Yes

Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b)

phs002339.v1.p1.c1

topmed-NuMom2B_GRU-IRB

Yes

Yes

For detailed platform release notes please consult the following resources:

BDC-PIC-SURE Tag Generation: PIC-SURE has updated help text in the user interface and documentation to address the frequently asked question, “How are variable tags generated?” Users can find this help text in the “Filter by Variable Tags” box on the PIC-SURE platform and in the .

Updated BDC-PIC-SURE documentation on the Export buttons: The and were updated to include information about the new Export buttons. These updates were also released in the .

Gen3 release notes PIC-SURE release notes

Data
PIC-SURE User Guide
PIC-SURE User Guide
Authorized Access: Select and Package Data Tool YouTube video
BDC Gitbook documentation
Terra release notes
Seven Bridges release notes
Dockstore release notes