LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Introduction
  • Step 1: Intent to Submit
  • Step 2: Study Registration in dbGaP
  • Step 3: Data Preparation
  • Step 4: Data Submission to BioData Catalyst (BDC)
  • Step 5: Next Steps
  • Need Assistance?

Was this helpful?

Export as PDF
  1. Data Management

Instructions for Data Submission to BDC

PreviousData Management StrategyNextDe-identification Readme

Last updated 1 month ago

Was this helpful?

Introduction

The five steps and their subparts outlined below provide instructions for submitting data and making it available through BDC. These instructions will be updated as new information and processes are made available.

Tip: To reduce the time to ingest and release data, you may work on more than one action at a time.

  • If you have already prepared your data (see Step 3), you may complete Step 1 (Intent to Submit) and then simultaneously work on dbGaP Study Registration (see Step 2) and begin the Data Submission process (see Step 4).

  • If you have not yet prepared your data, you may complete Step 1 (Intent to Submit) and then simultaneously work on dbGaP Study Registration (see Step 2) and Data Preparation (see Step 3).

Step 1: Intent to Submit

This step has two data submitter action items, and the first is different for NHLBI intramural investigators than for extramural investigators.

Data Submitter Action Item 1: for NHLBI Intramural Investigators Email for submission information.

Data Submitter Action Item 1: for Extramural Investigators Use the following email template, complete it with information specific to your study, and send it to .

Email template:

To: Subject: BioData Catalyst Data Submission [Grant Number / Award Number]

Study Name

Institution Name

PI Name

Grant Number/Award Number/ZIA Number

Expected date for data upload/submission

Does this submission include genomics data?

Does this submission include biospecimen?

Does this submission include imaging data?

After sending the email, you will receive an automated response with the following documents to use in Step 2.

  • Institutional Certification Form

  • Data Submission Information Sheet

  • Guidance document for registration of data in dbGaP

  • You will receive a response from the Genomic Program Administrator (GPA) confirming receipt of your email.

Step 1 Related Links

Step 2: Study Registration in dbGaP

All research data shared with BDC must be registered through dbGaP, though the controlled and non-controlled access processes may differ. The DMC will contact you and provide specific guidance in such cases. Study registration has two parts but only one action for data submitters.

  • The GPA will share the accession number and the consent group information with the DMC to create Data Submission Infrastructure for your study.

  • You will receive an automated email from dbGaP to complete Study Submission (see screenshots of the dbGaP email below).

  • Once you finish your study configuration, dbGaP will curate your submission and may contact you for questions. Once dbGaP completes its curation process, you will receive an email from dbGaP to approve and complete your study registration.

  • Note: While waiting for dbGaP curation, please proceed with data submission to BDC (steps 3 and 4 below) to reduce the time to ingest and release the data.

Step 3: Data Preparation

Data preparation can happen before, during, or after the study registration process and must be completed to submit data to BDC. This step has one action item for all data submitters and a second action item for submitters of omics and phenotypic data types.

  • Protocols

  • BDC compliant Data Dictionaries as separate files for each of the data files (not as a tab within the data file)* - please reference the Data Dictionary Requirement (Find the downloadable file at the BDC De-Identification-Readme.)

  • Survey Instruments

  • Data/Metadata model, if applicable

  • Datasets Readme*

    • Specify data file name and variable name for “subject ID” and “age”

    • Datasets organization - if the datasets are organized in multiple sub-folders, need a Readme file to describe the relationship of the sub-folders, if they are independent (e.g., multiple phases or visits), main studies with ancillary studies, or overlapping (e.g., /raw data and /harmonized data, where the /harmonized data is a subset of the /raw data).

  • Additional Supplemental documentation to reproduce study results

* Supported documentation types for data dictionaries and models are .csv, tab-delimited, xml, json, and other machine-readable formats. PDF and SAS file formats are not machine-readable and are discouraged from submission. File name should not include any spaces and special characters.

Step 4: Data Submission to BioData Catalyst (BDC)

  • Your institutional email address used for NIH eRA Commons

  • Subject: Data Submission

  • Type: Data Submission (select in the dropdown menu)

  • In the body of the message, 1) include your dbGaP PHS accession number and 2) request access for read/write permission to the assigned cloud bucket

    • In the rare case that your institute can’t access any cloud services hosted by Google or Amazon, request assistance for direct data upload from your data package location (e.g., SFTP transfer)

Data upload may not begin until your data is prepared (see Step 3: Data Preparation), and you receive an invitation from dbGaP to complete your study submission and configuration (see the Results section in Step 2.

Follow the links and instructions in the email to activate the Amazon Web Service (AWS) S3 web interface.

Once selected the specific bucket for a consent group, use the “Upload” button to upload data files.

If you choose to use the GCP platform, see screenshot below (“upload” highlighted)

Step 5: Next Steps

Once your data package is uploaded successfully, the data go through quality checks before ingestion and release. If issues are found, the DMC will contact you and assist in resolving the issues before ingestion and release. There are three data submitter action items associated with this step.

Need Assistance?

Result

Data Submitter Action Item 2: Complete the Institutional Certification and Data Submission Information Sheet (see results from Step 1, Action 1), and email them to .

Results

Extramural submitters will also receive a response from the BioData Catalyst Data Management Core (DMC) () to provide further assistance or answer data submission-related questions.

The GPA completes the first part of dbGaP study registration and, as a result, generates your study accession number. The GPA does this by entering information from your Institutional Certification and Data Submission Information Sheet into the dbGaP Submission System. If needed, the GPA may contact you for additional information or clarification or if asking for a data sharing plan and data use agreement.

Results

Data Submitter Action Item 1: After receiving the automated email from dbGaP, complete the dbGaP submission process using guidance available in the (See a screenshot of the dbGaP Study Submission portal below). Study Config consists of a web form that collects a description of the study data, methods, and findings, inclusion/exclusion, study history, references, attributions, and terms that will be indexed to enable users to search for your study in dbGaP Advanced Search.

Note: Gather all information ahead of the web form entry, as the current form does not have a “save” button for partial entry. Click to download the example files for dbGaP submission.

Results

Data Submitter Action Item 1: Prepare supplemental documentation to accompany the data submission (“data package”) according to the , including:

Describe the de-identification methods, see example file *

Data Submitter Action Item 2: Only for Omics and Phenotypic data types, prepare the data files per the .

Data submission has two action items for data submitters. This process can happen in parallel with Data Submitter Action Item 1 from Step 2. The data submission process begins by filling out the BDC contact form: .

Data Submitter Action Item 1: Request bucket creation by filling out the BDC contact form using the following information:

Data Submitter Action Item 2: Access the cloud bucket created for your study. You will receive a secure email from the Information Technology Applications Center (ITAC) team at NHLBI that provides the URL to activate the access with user ID and password (see screenshot below):

If you have any questions or issues about accessing the buckets, please contact

Data Submitter Action Item 3: Upload data sets to the cloud bucket created for your study. After access, upload datasets for each consent group to the corresponding buckets (e.g., xxxx-c1) as described in the dbGaP 2b file.

Data Submitter Action Item 1: If the DMC contacts you about QC issues with the uploaded data, respond to their inquiries to resolve the issues.

Data Submitter Action Item 2: If requested by the DMC, resubmit the data package after all issues are resolved.

After data clears the data quality checks, the ingestion and release process can take as few as 4-6 weeks. After the data is released, the DMC will notify you that your study is available for use by authorized individuals in BDC ().

Data Submitter Action Item 3: You are encouraged to log in and view your study data in BDC.

Contact the BioData Catalyst Data Management Core (DMC) via and select “Data Submission” in the Type field.

✅
✅
🔵
✅
✅
➡️
➡️
➡️
➡️
➡️
➡️
➡️
➡️
NHLBIDIRBDCSubmission@mail.nih.gov
bdcatalystdatasharing@nih.gov
bdcatalystdatasharing@nih.gov
➡️
nhlbigeneticdata@nhlbi.nih.gov
nhlbi.dmc.concierge@rti.org
Data Submission Information Sheet
DUL Statements for Institutional Certification
➡️
dbGaP Study Configuration Process for Submission of Data to BDC
here
➡️
Instructions for Preparing Clinical Research Study Datasets for Submission to the NHLBI
here
➡️
dbGaP Study Submission Guidance
https://biodatacatalyst.nhlbi.nih.gov/contact
nhlbi.dmc.concierge@rti.org
study inventory
https://biodatacatalyst.nhlbi.nih.gov/contact