Data Submission FAQs

Why do I need to submit to dbGaP when I don’t have genetic data?

Answer: Historically, the BDC registration and ingestion mechanisms were developed to support omics data for TOPMed. As BDC evolved to include clinical and other non-genomic data types, dbGaP registration continued to be a mechanism of registering and supporting authorization of controlled data access. In collaboration with dbGaP, BDC was able to support a registration process that allowed for us to continue leveraging the data access management and request mechanisms that we have been using, but with the data being submitted to BDC rather than to dbGaP.

Do I need to upload my data to both dbGaP and BDC?

Answer: As stated in the Instructions for Data Submission to BDC, study “data files” are uploaded to BDC only (step 4), and “study level metadata and subject consent files” are uploaded to dbGaP (step 2).

Where is the dbGaP submission link?

Answer: The Submission Portal (SP) link is https://submit.ncbi.nlm.nih.gov/dbgap/.

During dbGaP submission, “Sample Attributes” (6a/6b) is a required field, but I don’t have samples.

Answer: If your study doesn’t have samples, please use a dummy blank file

Will the ‘record IDs’ from datasets be masked/transformed by dbGAP team before publishing?

Answer: Each subject should be submitted with a single, unique, de-identified subject ID. dbGaP will assume the ‘record IDs’ are de-identified and will not mask/transform before publishing. If this study dataset is collected from a cohort that have existing dbGaP study (parent study or sub-studies), dbGaP will curate that the IDs are the same, or will need the linkage file to link the IDs.

Is the data we uploaded to BioData Catalyst available now for researchers to request? We have a manuscript using the data which was recently accepted for publication, and we wanted to include a link in our manuscript to the dataset.

Answer: The data uploaded will go through the BDC ingestion process, which takes as little as 4-6 weeks to release.

What are the data types that are currently acceptable to BDC?

Answer: BioData Catalyst can ingest data types of all types and sizes, including, but not limited to genomic and proteomic to clinical/phenotypic and imaging data.

How does BDC want the dates represented in the datafiles? Can we date shift or should they be converted to days from an index date?

Answer: BDC will accept any de-identification methods for dates, as long as it’s been documented for others to understand and reuse the data. Please reference the Instructions for Preparing Clinical Research Study Datasets for Submission to the NHLBI.

How do we start the data upload? (steps)

Answer: Please reference the Instructions for Data Submission to BDC document for the information.

What data types does BioData Catalyst accept? Is it a suitable repository for my proposed multiple data types (including imaging) to include in my DMS Plan?

Answer: BioData Catalyst (BDC) is flexible to accept most data types related to human data, including imaging data. For non-human data, many NIH-supported generalist repositories are available as alternatives. You should work with your program officials to determine the best repository for your data.

I’m preparing a data sharing plan for an upcoming NHLBI proposal submission. Does data submission and storage have any associated costs or fees that I need to budget for?

Answer: NHLBI’s BioData Catalyst, is envisioned as the central repository for NHLBI supported studies , so for your DMS plan, it would be good to reference BDC as the repository for your data. NHLBI covers the cost of storing data submitted to BDC for broad research reuse (“hosted data”). If you plan to use BDC to analyze or prepare your data for sharing, you will incur computational costs which vary depending on the size of your data and the scale of your compute. More information on cloud costs is available at: https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/tutorials-videos-and-modules/seven-bridges-tutorials/estimating-and-managing-your-cloud-costs.

PreviousDe-identification Readme NextdbGaP Study Configuration Process for Submission of Data to BDC

Last updated 6 months ago

Was this helpful?