LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Introduction
  • Significant new features
  • New user support materials and documentation
  • Data Releases
  • Planned Upcoming Data Releases
  • For detailed platform release notes please consult the following resources:

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Release Notes

2022-10-03 BioData Catalyst Ecosystem Release Notes

Previous2023-01-09 BioData Catalyst Ecosystem Release NotesNext2022-07-11 BioData Catalyst Ecosystem Release Notes

Last updated 2 years ago

Was this helpful?

Introduction

The 2022-10-03 release marks the eleventh release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., PIC-SURE's new search interface) along with updated documentation. This release also includes updated versions of the Study Variable Explorer and the Annotation Explorer. Please find more detail on the new features and user support materials in the sections below.

The 2022-10-03 data releases include the addition of TOPMed Boston-Brazil SCD and PCGC datasets. Please refer to the Data Releases section below for more information as well as the on the BioData Catalyst website.

Significant new features

Now export with Study Variable Explorer on BioData Catalyst Powered by Seven Bridges: The on BioData Catalyst Powered by Seven Bridges allows researchers to explore phenotypic variables from the TOPMed data dictionaries in an open access manner. Seven Bridges released Study Variable Explorer version 2 which expands on version 1 by adding tag search, notes, and data export. The latest update enables researchers to track their variable selection process through notes tied to study and variable information which can be shared with collaborators through .json export. This gives analysts tractable information for reproducing decision-making during the harmonization process.

New Interactive Web Apps Gallery: Under the “Public Gallery” dropdown on BioData Catalyst Powered by Seven Bridges, a new display for “Interactive Web Apps” provides access to the LocusZoom and Model Explorer R Shiny applications.

Annotation Explorer Version 2: The Annotation Explorer enables users to interactively explore, query, and study characteristics of an inventory of annotations for the variants across the genome. This application can be used pre-association testing to interactively explore variant aggregation, filtering strategies, and generate input files for multiple-variant association testing, or post-association testing to explore annotations associated with a set of significant variants or variants of interest. Seven Bridges previously released the Annotation Explorer R Shiny application through a Public Project. Now, Annotation Explorer is integrated with BioData Catalyst Powered by Seven Bridges through the “Data” dropdown. The new integration enables querying genome wide annotations and variants (including the TOPMed Freeze5 and Freeze8 datasets) in a more user-friendly interface without running an R Studio notebook. This release is integrated into the billing system so a user can select their compute needs based on price and monitor Annotation Explorer-specific costs through their billing group.

New CWL Tools and Workflows on BioData Catalyst Powered by Seven Bridges:

  • GATK VariantEval BETA 4.2.5.0 tool which is used for evaluating variant calls.

  • GATK FilterMutectCalls 4.2.5.0 tool which is used to filter somatic SNVs and indels called by Mutect2.

  • Picard CreateSequenceDictionary 2.25.7 tool for creating a DICT index file for a sequence.

  • WARP ExomeGermlineSingleSample 2.4.4 pipeline for data pre-processing and variant calling in human WES data.

  • BCFtools 1.15.1 toolkit - CWL1.2

  • Kraken2 2.1.2 toolkit

  • SRA (v3.0.0, CWL1.2)

    • SRA sam-dump that converts SRA data into SAM format. With aligned data, NCBI uses Compression by Reference, which only stores the differences in base pairs between sequence data and the segment it aligns to. The process to restore original data, for example as FASTQ, requires fast access to the reference sequences that the original data was aligned to.

    • SRA fasterq-dump tool that converts SRA data into FASTQ format while using temporary files and multi-threading to speed up the extraction.

    • SRA fastq-dump tool that converts SRA data into FASTQ format.

  • Salmon (v1.5.2, CWL1.2)

    • Salmon Alevin tool that introduces a family of algorithms for quantification and analysis of 3’ tagged-end single-cell sequencing data.

    • Salmon Index tool that builds an index necessary for the Salmon Quant and Salmon Alevin tools. To create an index, it uses a transcriptome reference file in FASTA format. Additionally, one can provide a genome reference along with transcriptome to create a hybrid index compatible with the improved mapping algorithm named Selective Alignment.

Updated Interactive Analysis interface on Terra: Under the new design, the “Notebooks” tab is transformed into the more general “Analyses” tab, from where you can access the multiple applications available for Interactive Analysis in Terra. Accordingly, the list of Notebook files (.ipynb) becomes the list of “Your Analyses”, which now supports including R Markdown files (.Rmd). Just like Notebook files, any R Markdown files created in or added to the Analyses tab will be automatically stored in the workspace bucket and synced between the bucket and your persistent disk.

PIC-SURE's new search interface: PIC-SURE has released an improved dynamic data exploration experience, allowing users to easily search and query at the variable value and genomic variant level. The streamlined search experience enables users to search variables and view associated information, such as decoded variable level information, details about the dataset, and study information - all without opening any data files. Updates to the interface include filtering search results by variable and study tags, a new genomic filtering model, adding variables to export without filtering, a simpler select and package data process, and visualizing single variable distributions.

Dedicated PIC-SURE images within Seven Bridges analysis workspaces: The Seven Bridges and PIC-SURE teams have collaborated to provide users with dedicated workspace images that contain all the pre-installed packages necessary to run the PIC-SURE example notebooks. PIC-SURE API users in Seven Bridges will not have to worry about changes to package dependencies and/or versions, and R users in particular will notice a significantly faster start-up time during environment set-up. The PIC-SURE images are available in both the JupyterLab and RStudio Seven Bridges environments. Users can find this feature by specifying the Environment setup of any Data Cruncher analysis.

New user support materials and documentation

Data Releases

The table below highlights which studies were included in the Q3 2022 data releases. The data is now available for access across the entire ecosystem.

Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version

BostonBrazil_SCD

phs001599

topmed-BostonBrazil_SCD_HMB-IRB-COL

Yes

PCGC

phs001735.c1

topmed-PCGC_CHD_HMB

No

Yes

PCGC

phs001735.c2

topmed-PCGC_CHD_DS-CHD

No

Yes

National Sleep Research Resource (NSRR)

phs002715-c1

NSRR-CFS_DS-HLBS-IRB-NPU

Yes

FHS_phs000974_TOPMed_WGS_freeze.9b

phs000974

TOPMed_FHS

No

Yes

Planned Upcoming Data Releases

Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version

PCGC SRA

phs000571.v6.p2

PCGC-CHD-GENES_HMB

Yes

National Sleep Research Resource (NSRR)

  • This dataset had to be ingested again to accommodate additional data provided by data owners

phs002715-c1

NSRR-CFS_DS-HLBS-IRB-NPU

No

No

For detailed platform release notes please consult the following resources:

Cure Sickle Cell Metadata Catalog integration: PIC-SURE has updated the Data Access Table to integrate information about sickle cell disease (SCD) studies from the (MDC). The “Additional Information” column includes a link to that SCD study’s page on the MDC. The Data Access Table also includes other new information, such as study design and study focus.

New BioData Catalyst Powered by PIC-SURE search interface: The documentation associated with PIC-SURE has been updated to reflect the recent release of the new search interface. This includes the and the tutorial videos on the .

Updated documentation on new Terra Interface: The documentation associated with Terra has been updated to reflect the recent release of the new analysis interface. This includes the Terra and the tutorial videos on the .

Gen3 release notes PIC-SURE release notes

Data page
Study Variable Explorer
Cure Sickle Cell Metadata Catalog
BioData Catalyst Powered by PIC-SURE User Guide
BioData Catalyst Powered by PIC-SURE YouTube playlist
Workspace Quickstart Guide
Terra YouTube channel
Terra release notes
Seven Bridges release notes
Dockstore release notes