Creating, testing & scaling WDL workflows
Last updated
Was this helpful?
Last updated
Was this helpful?
In this section, the reader will learn how to use the Terra and Dockstore platforms for the creation of WDL workflows for analysis and sharing with the scientific community. Below we have compiled community and BioData Catalyst resources to help users get started learning WDL to create their own workflows.
Workspace: A dedicated space where you and collaborators can access and organize the same data and tools and run analyses together. They can include: data, notebooks, and workflows. They can be public or controlled access.
Workflow: Chains of connected tools to accomplish a full analysis. Tools are often connected in a specific way to enable maximum computational efficiency and are also constructed to accomplish a specific analysis goal. A workflow typically describes a full analysis (e.g. variant discovery, differential expression, or multiple variant association tests).
Workflow Description Language (WDL): A community-driven standard for describing data analysis pipelines and is easily portable across different computing environments. It is the language currently used to run batch-processes in Terra, which uses Cromwell as an executor. Like other descriptor languages, it is paired with Docker containers and can execute pipelines written in any language (bash, R, Python, etc.). Below, we have compiled community and BioData Catalyst resources to help users get started learning WDL to create their own tools and workflows.
Syntax:
Authoring:
offers a nice balance between usability and editing features.
Syntax highlighters: Plugins that enable syntax highlighting (i.e. coloring code elements based on their function) for supported text editors. Syntax highlighting has been developed for , , and .
Visualization: is a web-based tool that creates an interactive graphical representation of any workflow written in WDL; also includes WDL code generation functionality.
Execution engine: is an execution engine co-developed with WDL; it can be used on multiple platforms through pluggable backends and offers sophisticated pipeline execution features. See the doc entry on for quickstart instructions.
Validation & inputs: is a Java command-line tool co-developed with WDL that performs utility functions, including syntax validation and generation of input JSON templates. See the doc entries on and for quickstart instructions.
Running tools:
is a cloud-based analysis platform for running workflows written in WDL via Cromwell on Google Cloud; it is open to the public and offers sophisticated data and workflow management features. In this BYOT document, we walk through all of the steps to run a workflow in Terra.
is a lightweight command-line workflow submission system that runs WDLs via Cromwell on Google Cloud.
is a Bioconductor package to manage WDL workflows from within R, developed by Sean Davis. See docs .
Below are a few learning resource tutorials we have compiled from various sources:
You can start developing your WDL workflow locally with Dockstore’s CLI and a small test dataset. This route allows you to debug syntax errors while avoiding cloud costs. Once your workflow is debugged, you can launch in a cloud environment to test for permissions errors and scaling issues. The Dockstore CLI automatically installs the Cromwell execution engine for running WDL workflows locally.
Instructions:
You can find more information about this process in the section Version Control, Publishing, and Validation of Workflows below.
Figure 1. Dockstore’s “Launch with BioData Catalyst” button.
Figure 2. In Terra workspaces, when you are in the "Workflows" tab you can “Find Additional Workflows” from Dockstore and the Broad Methods Repository.
Once your workflow is in Terra, you may want to check out some of the learning resources below for configuring, troubleshooting, and optimizing your workflow. There are likely additional configuring and troubleshooting steps needed for getting your workflow up and running on larger datasets hosted in the cloud.
Terra also has several tips for reducing costs in order to promote the efficiency of a workflow. These approaches include deleting intermediate files and returning only final output to limit storage costs. Virtual machines can be configured with certain settings with reduced costs, such as using preemptible machines that trade-off reduced costs for the potential interruption. Cost optimizations are described at the following links:
Once your workflow is working as expected, we ask that you publish your work to share with the research community. You can find resources for how to publish your work on GitHub and Dockstore in the section below titled Version Control, Publishing, and Validation of Workflows.
Open WDL’s offers a comprehensive set of exercises for users that are just learning WDL.
from Dockstore is an introductory guide.
These Dockstore along with this accompanying provide more complex examples using common bioinformatics tools.
Once you are more familiar with writing workflows, we suggest you continue with from Dockstore.
locally
locally
This steps through creating a basic WDL workflow locally and pushing the tool to GitHub, triggering an automated build on Quay.io.
In order to transition your workflow from local development to Terra, a typical approach is to make the workflow available in a GitHub repository and then build. Quay.io integrates with Dockstore and GitHub by automatically building upon GitHub pushes. The Quay.io build can then be registered on Dockstore. You can follow the steps for linking your Dockstore account to external services like Quay.io in this .
Now that you have a workflow ready for running in a cloud environment, you can port your workflow into Terra in two ways. First, if you are already using Dockstore and GitHub for version control, you can navigate to your Dockstore WDL workflow and use the "Launch with NHLBI BioData Catalyst" button. This article provides instructions for selecting a workflow in Dockstore then conveniently importing that workflow into Terra
If you haven’t published your workflow to Dockstore, you can also using the Broad Methods Repository. The Broad Methods Repository can easily be found in the “Add workflows” section of your Terra workspace. Similar to Dockstore, this repository hosts many WDL workflows that have been created by the Terra community. These workflows are only public once a user has signed into Terra.