Writing BDC into a Grant Proposal

Guidance on writing BDC into a research proposal and the various costs you should budget for.

Writing BDC into your proposal’s budget

BDC is a cloud-based ecosystem which seeks to empower researchers analyzing phenotypic and genotypic heart, lung, blood, and sleep data. Researchers on NHLBI BioData Catalyst have access to a number of controlled and open datasets, as well as the power to bring their own data to the ecosystem for analysis.

This document intends to serve as a resource for researchers writing NHLBI BioData Catalyst into grant proposals.

The BDC ecosystem leverages two well-known cloud computing services, Google Cloud Platform (GCP) and Amazon Web Services (AWS), to perform computational analysis and store data. Users may scale their workloads up or down by toggling the virtual machine (VM) instance size and attached storage, as well as horizontally scale workloads by specifying a number of parallel instances. Increasing compute power, storage, and parallelization has an associated increase in cost, which is estimated for the researcher.

The platforms within the BDC ecosystem come equipped with cloud workspaces containing workflows and analysis tools. Depending on the platform, workflows may be available in WDL (Workflow Description Language) or CWL (Common Workflow Language), and accessible from Dockstore.org, the Seven Bridges public gallery, or the Broad Methods Repository. In total, these sources contribute over 2000 workflows. Additionally, researchers may access standard analytical tools such as R Studio, JupyterLab, Jupyter Notebooks, and SAS Studio.

Incurring Cloud Costs

Platforms within NHLBI BioData Catalyst use a combination of Google Cloud Platform (GCP) and Amazon Web Services (AWS) for storing and analyzing data in the ecosystem. Researchers on BioData Catalyst begin to incur fees when they use the ecosystem in one of the following ways:

  1. Data Storage: When a researcher uploads their own data or stores derived results on a cloud environment, they will begin to incur data storage costs on the platform their instance is located on.

  2. Computing / Analysis: When a researcher runs a task in a platform they will incur charges based on their usage.

  3. Egress charges: When a researcher transfers data out of cloud storage.

  4. Platform Support: When projects require a significant amount of support researchers may need to purchase time from ecosystem platforms, though standard support is provided free of charge.

For more information on each of these categories, see below. You may also use the following links to view platform-specific guidance for BioData Catalyst Powered by Terra or BioData Catalyst Powered by Seven Bridges.

Data Storage

In general, storage charges are billed on all files in a workspace that belong to that project. This includes all files a researcher uploads to BioData Catalyst and any results files generated by their workflows and analysis. This does NOT include controlled dataset files hosted by BioData Catalyst.

Storage costs vary based on the amount of data a researcher stores, what type of disk or service they use for storing the data, and the services they select (AWS or GCP). For the most up-to-date information on storage rates, see these articles on Amazon S3 storage and Google Cloud Storage.

Computing / Analysis

Compute costs vary and depend on a range of factors including:

  • The platform and cloud infrastructure provider where an analysis is performed

  • Workspace & cloud instance settings

  • Length of time to workflow completion

Egress Charges (Data Transfer)

By default any data uploaded or generated in a workspace is stored on a single cloud provider instance. If a researcher opts to move these files, they will be charged Egress fees, otherwise known as Data Transfer fees. These fees will occur if they:

  • Transfer files to another cloud provider, OR

  • Download files to a local machine

Fees for data egress vary based on service providers and what actions a researcher takes.

Platform Support

BioData Catalyst provides general support for researchers on all ecosystem platforms free of charge. If a researcher anticipates needing a large amount of support for specialized activities, such as organizing a large training workshop, they can reach out to the BioData Catalyst Coordinating Center (bdc3@renci.org) and/or the platform liaisons to discuss these needs as they develop their proposal.

Estimating your cloud costs

Each NHLBI BioData Catalyst platform offers tools and tutorials to help you estimate your cloud costs. For information on these tools and how to run them, please see the below articles.

For more help on estimating your anticipated cloud costs, please contact the NHLBI BioData Catalyst help desk.

Writing a Budget Justification

The below sample language can be used as a resource for when a researcher is preparing to write a budget justification for including NHLBI BioData Catalyst cloud costs in a proposal.

Note In the following sample language, items in [brackets] that are bolded and italicized are where you should insert your details.

Sample Language: BioData Catalyst Data Storage

All users with appropriate access credentials will have access to data hosted on BioData Catalyst. Controlled and open datasets already hosted on NHLBI BioData Catalyst will not incur storage costs. Our data storage budget will fund the storage of any derived results data (e.g, temporary and secondary files generated as a result of analyses on hosted data) and/or [XX TB] of data we plan to upload using the Bring Your Own Data tool. Data storage estimates were generated using amounts pulled on [MM/DD/YYYY estimate was generated].

Sample Language: BioData Catalyst Analysis Costs

The BioData Catalyst ecosystem features several platforms with secure workspaces where researchers can run workflow analyses of genomic and phenotypic data. Our estimated analysis costs include [insert time amount] of analyst time, as well as an overall compute estimate generated with BioData Catalyst documentation help.

Sample Language: BioData Catalyst Egress Fees

BioData Catalyst, and all files generated by it, are hosted on the Google Cloud Platform and Amazon Web Services. We anticipated during the course of this project that some data will be subject to egress charges as a result of transferring across cloud providers or downloading data to local compute infrastructure. We currently anticipate [insert data estimate] will be subject to egress charges each year. Our estimated egress costs are based on pricing information gathered on [MM/DD/YYYY estimate was generated].

Sample Language: Platform liaison support

After consulting with members of the BioData Catalyst support team, we anticipate needing to purchase support time for additional training not covered under the standard provisions. The support team estimated we will require [insert estimate funding amount] to purchase this additional time.

Letters of Support

To request a Letter of Support from the BioData Catalyst Coordinating Center, email bdc3@renci.org with the following information:

  • Researcher Name

  • Role in BioData Catalyst, if any

    • For example: Fellow, graduate student working with a Fellow, and so on

  • Project title

  • Brief project description

  • Brief description of how you plan to use BioData Catalyst in your project

  • What resources you might need from the Consortium, if any

    • For example: training, ingestion of new data, and so on

  • What resources you might add to the Consortium, if any

    • For example: New workflows or tools

Last updated