Guidance on writing BDC into a research proposal
BDC is a cloud-based ecosystem which seeks to empower researchers analyzing phenotypic and genotypic heart, lung, blood, and sleep data. Researchers on NHLBI BioData Catalyst have access to a number of controlled and open datasets, as well as the power to bring their own data to the ecosystem for analysis.
This document intends to serve as a resource for researchers writing NHLBI BioData Catalyst into grant proposals.
The BDC ecosystem leverages two well-known cloud computing services, Google Cloud Platform (GCP) and Amazon Web Services (AWS), to perform computational analysis and store data. Users may scale their workloads up or down by toggling the virtual machine (VM) instance size and attached storage, as well as horizontally scale workloads by specifying a number of parallel instances. Increasing compute power, storage, and parallelization has an associated increase in cost, which is estimated for the researcher.
The platforms within the BDC ecosystem come equipped with cloud workspaces containing workflows and analysis tools. Depending on the platform, workflows may be available in WDL (Workflow Description Language) or CWL (Common Workflow Language), and accessible from Dockstore.org, the Seven Bridges public gallery, or the Broad Methods Repository. In total, these sources contribute over 2000 workflows. Additionally, researchers may access standard analytical tools such as R Studio, JupyterLab, Jupyter Notebooks, and SAS Studio.
For information on the different costs you should budget for and how to estimate costs, see Incurring Cloud Costs.
The below sample language can be used as a resource for when a researcher is preparing to write a budget justification for including NHLBI BioData Catalyst cloud costs in a proposal.
Note In the following sample language, items in [brackets] that are bolded and italicized are where you should insert your details.
All users with appropriate access credentials will have access to data hosted on BioData Catalyst. Controlled and open datasets already hosted on NHLBI BioData Catalyst will not incur storage costs. Our data storage budget will fund the storage of any derived results data (e.g, temporary and secondary files generated as a result of analyses on hosted data) and/or [XX TB] of data we plan to upload using the Bring Your Own Data tool. Data storage estimates were generated using amounts pulled on [MM/DD/YYYY estimate was generated].
The BioData Catalyst ecosystem features several platforms with secure workspaces where researchers can run workflow analyses of genomic and phenotypic data. Our estimated analysis costs include [insert time amount] of analyst time, as well as an overall compute estimate generated with BioData Catalyst documentation help.
BioData Catalyst, and all files generated by it, are hosted on the Google Cloud Platform and Amazon Web Services. We anticipated during the course of this project that some data will be subject to egress charges as a result of transferring across cloud providers or downloading data to local compute infrastructure. We currently anticipate [insert data estimate] will be subject to egress charges each year. Our estimated egress costs are based on pricing information gathered on [MM/DD/YYYY estimate was generated].
After consulting with members of the BioData Catalyst support team, we anticipate needing to purchase support time for additional training not covered under the standard provisions. The support team estimated we will require [insert estimate funding amount] to purchase this additional time.
To request a Letter of Support from the BioData Catalyst Coordinating Center, email bdc3@renci.org with the following information:
Researcher Name
Role in BioData Catalyst, if any
For example: Fellow, graduate student working with a Fellow, and so on
Project title
Brief project description
Brief description of how you plan to use BioData Catalyst in your project
What resources you might need from the Consortium, if any
For example: training, ingestion of new data, and so on
What resources you might add to the Consortium, if any
For example: New workflows or tools