Bring Your Own Tool(s) to BioData Catalyst

Authors: Beth Sheets (UC Santa Cruz, Genomics Institute), Dave Roberson (Seven Bridges)

Contributors: Dan Vicente (Seven Bridges), Alison Leaf (Seven Bridges), Stephanie Gogarten (Fellow), Sheila Gaynor (Fellow), Jean Monlong (Fellow), Kenny Westermann (Fellow)

Reproducibility is one of the biggest challenges facing science. Several issues associated with reproducibility have been well summarized in the FAIR (Findable, Accessible, Interoperable, and Reusability) Guiding Principles. The BioData Catalyst ecosystem promotes FAIR and reproducible analyses by leveraging Docker-based reproducible tools in two descriptor languages. The Common Workflow Language (CWL) is currently supported in Seven Bridges workspaces, while the Workflow Description Language (WDL) is currently supported in Terra workspaces.

A combination of software containers (like Docker) and workflow languages wrap your bioinformatics pipeline, making your analysis portable across local and cloud execution environments. This allows researchers to reproduce your method(s) with exactly the same software, dependencies, and configurations. For example, BioData Catalyst researchers have been able to reuse CWL and WDL versions of a Genome-Wide Association pipeline developed by the TOPMed Data Coordinating Center in multiple cloud workspaces.

There are hundreds of CWL and WDL pipelines already available for researchers to run on BioData Catalyst. Both CWL pipelines and WDL pipelines can be discovered in Dockstore’s open-access catalog and then executed in the workspace environments. In addition, the Seven Bridges platform hosts CWL workflows directly on the platform in the Public Apps Gallery, and the Terra platform hosts WDL workflows in the Broad Methods Repository. However, many researchers will want to work with pipelines that do not have CWL or WDL versions yet or need to make changes to existing CWL and WDL pipelines. This guide will describe the steps for how to “Bring Your Own Tool” to the BioData Catalyst ecosystem.

Whether you are working with WDL or CWL tools, all users will begin by creating a containerized version of their pipeline. There are multiple methods users take to create these tools, but we simplify this process by walking through two example paths. For researchers utilizing the Terra workspace environment, we describe how to start by writing your WDL tool locally and then configuring and testing in the cloud workspace. For researchers performing analyses on the Seven Bridges workspace environment, we describe how to use the Seven Bridges platform web composer and web editor features to add a CWL wrapper to the Docker image. You may find it easiest to start with learning one language (for example, the one that works in your chosen workspace environment) and then expanding to multiple languages if needed.