Creating, testing & scaling WDL workflows

In this section, the reader will learn how to use the Terra and Dockstore platforms for the creation of WDL workflows for analysis and sharing with the scientific community. Below we have compiled community and BDC resources to help users get started learning WDL to create their own workflows.

Helpful definitions of terms when working on Terra

Workspace: A dedicated space where you and collaborators can access and organize the same data and tools and run analyses together. They can include: data, notebooks, and workflows. They can be public or controlled access.

Workflow: Chains of connected tools to accomplish a full analysis. Tools are often connected in a specific way to enable maximum computational efficiency and are also constructed to accomplish a specific analysis goal. A workflow typically describes a full analysis (e.g. variant discovery, differential expression, or multiple variant association tests).

Workflow Description Language (WDL): A community-driven standard for describing data analysis pipelines and is easily portable across different computing environments. It is the language currently used to run batch-processes in Terra, which uses Cromwell as an executor. Like other descriptor languages, it is paired with Docker containers and can execute pipelines written in any language (bash, R, Python, etc.). Below, we have compiled community and BDC resources to help users get started learning WDL to create their own tools and workflows.

WDL Toolkit: All the tools you need to write and run WDLs

Learning Resources for writing WDL

Below are a few learning resource tutorials we have compiled from various sources:

Writing your workflow locally using Dockstore’s Command Line Interface

You can start developing your WDL workflow locally with Dockstore’s CLI and a small test dataset. This route allows you to debug syntax errors while avoiding cloud costs. Once your workflow is debugged, you can launch in a cloud environment to test for permissions errors and scaling issues. The Dockstore CLI automatically installs the Cromwell execution engine for running WDL workflows locally.

Instructions:

Building and releasing your workflow

In order to transition your workflow from local development to Terra, a typical approach is to make the workflow available in a GitHub repository and then build. Quay.io integrates with Dockstore and GitHub by automatically building upon GitHub pushes. The Quay.io build can then be registered on Dockstore. You can follow the steps for linking your Dockstore account to external services like Quay.io in this documentarrow-up-right.

You can find more information about this process in the section Version Control, Publishing, and Validation of Workflows below.

Testing and using your workflow in the cloud with Terra

Now that you have a workflow ready for running in a cloud environment, you can port your workflow into Terra in two ways. First, if you are already using Dockstore and GitHub for version control, you can navigate to your Dockstore WDL workflow and use the "Launch with NHLBI BioData Catalyst" button. This article Importing a Dockstore workflow into Terraarrow-up-right provides instructions for selecting a workflow in Dockstore then conveniently importing that workflow into Terra

Figure 1. Dockstore’s “Launch with BioData Catalyst” button.

If you haven’t published your workflow to Dockstore, you can also upload a workflow directly into Terraarrow-up-right using the Broad Methods Repository. The Broad Methods Repository can easily be found in the “Add workflows” section of your Terra workspace. Similar to Dockstore, this repository hosts many WDL workflows that have been created by the Terra community. These workflows are only public once a user has signed into Terra.

Figure 2. In Terra workspaces, when you are in the "Workflows" tab you can “Find Additional Workflows” from Dockstore and the Broad Methods Repository.

Once your workflow is in Terra, you may want to check out some of the learning resources below for configuring, troubleshooting, and optimizing your workflow. There are likely additional configuring and troubleshooting steps needed for getting your workflow up and running on larger datasets hosted in the cloud.

Optimizing workflows on Terra

Terra also has several tips for reducing costs in order to promote the efficiency of a workflow. These approaches include deleting intermediate files and returning only final output to limit storage costs. Virtual machines can be configured with certain settings with reduced costs, such as using preemptible machines that trade-off reduced costs for the potential interruption. Cost optimizations are described at the following links:

Once your workflow is working as expected, we ask that you publish your work to share with the research community. You can find resources for how to publish your work on GitHub and Dockstore in the section below titled Version Control, Publishing, and Validation of Workflows.

Last updated

Was this helpful?