2022-04-04 BioData Catalyst Ecosystem Release Notes
Introduction
The 2022-04-04 release marks the ninth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., machine learning tools for chest CT imaging) along with documentation and tutorials (e.g., a new guide to sharing content) to help new users get started on the system. This release also includes enhanced support for synchronizing tools and workflows between Dockstore and GitHub. Please find more detail on the new features and user support materials in the sections below.
The 2022-04-04 data release includes the addition of COVID-19 datasets ACTIV4a and ACTIV4b. Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.
Significant new features
Machine learning tools for chest CT imaging: Seven Bridges and Harvard Medical School have collaborated to release a Public Project of machine learning tools titled: Automated Chest Imaging Platform (CIP) CT Phenotyping and Machine Learning Discovery in COPD. The Public Project includes a detailed guide for other researchers to use the tools and notebooks on COPD datasets or modify the tools for their own lung CT data.
Storage optimized instances on Seven Bridges: Users can now access i3 and i3en AWS instances for Interactive Analysis (R Studio, JupyterLabs, SAS Studio) on Seven Bridges. These storage optimized instances provide access to between 5 TB and 60 TB of storage for interactive environments which enables researchers to harmonize larger datasets.
New CWL tools and workflows on Seven Bridges:
GATK RNAseq short variant discovery 4.2.0.0
GRIDSS toolkit
scVelo 0.2.4
Smoove toolkit
Sambamba tools 0.8.1
New user support materials and documentation
Share content through Public Projects: Seven Bridges has published a new guide in the knowledge center offering an alternative way to share new workflows, notebooks, and open access data with the BDCatalyst community. Public Projects provide a space for researchers to publish their analyses with open access sample data, detailed walkthroughs, and contact information for feedback and improvements. Both researchers developing new tools and researchers using preconfigured pipelines benefit from published Public Projects.
Dockstore synchronization with GitHub: Dockstore has simplified its tool and workflow registration process to automatically synchronize with GitHub. Dockstore released several example templates for how you can set up your GitHub repo with another file (.dockstore.yml) needed to kick off this process. Check out this overview of the process for an introduction, and visit the updated Getting Started tutorials for registering tools and workflows on Dockstore to learn more.
Data Releases
The table below highlights which studies were included in the Q1 2022 data releases. COVID-19 datasets ACTIV4a and ACTIV4b were released to production. Most of the work for ingestion of COVID19-C3PO dataset has been done and will be released in early April. TOPMed Freeze 9 datasets were ingested as the data became available. Twenty datasets were ingested and will be released as part of the fourth batch in early April as well. The data is now available for access across the entire ecosystem.
Planned upcoming Data Releases
For detailed platform release notes please consult the following resources:
PIC-SURE release notes
Gen3 release notes
Last updated