Data Submission FAQs

Why do I need to submit to dbGaP when I don’t have genetic data?

  • Answer: Historically, the BDC registration and ingestion mechanisms were developed to support omics data for TOPMed. As BDC evolved to include clinical and other non-genomic data types, dbGaP registration continued to be a mechanism of registering and supporting authorization of controlled data access. In collaboration with dbGaP, BDC was able to support a registration process that allowed for us to continue leveraging the data access management and request mechanisms that we have been using, but with the data being submitted to BDC rather than to dbGaP.

Do I need to upload my data to both dbGaP and BDC?

  • Answer: As stated in the Instructions for Data Submission to BDC, study “data files” are uploaded to BDC only (step 4), and “study level metadata and subject consent files” are uploaded to dbGaP (step 2).

Where is the dbGaP submission link?

During dbGaP submission, “Sample Attributes” (6a/6b) is a required field, but I don’t have samples.

  • Answer: If your study doesn’t have samples, please use a dummy blank file

Will the ‘record IDs’ from datasets be masked/transformed by dbGAP team before publishing?

  • Answer: Each subject should be submitted with a single, unique, de-identified subject ID. dbGaP will assume the ‘record IDs’ are de-identified and will not mask/transform before publishing. If this study dataset is collected from a cohort that have existing dbGaP study (parent study or sub-studies), dbGaP will curate that the IDs are the same, or will need the linkage file to link the IDs.

Is the data we uploaded to BioData Catalyst available now for researchers to request? We have a manuscript using the data which was recently accepted for publication, and we wanted to include a link in our manuscript to the dataset.

  • Answer: The data uploaded will go through the BDC ingestion process, which takes as little as 4-6 weeks to release.

What are the data types that are currently acceptable to BDC?

  • Answer: BioData Catalyst can ingest data types of all types and sizes, including, but limited to genomic and proteomic to clinical/phenotypic and imaging data.

How does BDC want the dates represented in the datafiles? Can we date shift or should they be converted to days from an index date?

How do we start the data upload? (steps)

What data types does BioData Catalyst accept? Is it a suitable repository for my proposed multiple data types (including imaging) to include in my DMS Plan?

  • Answer: BioData Catalyst (BDC) is flexible to accept most data types related to human data, including imaging data. For non-human data, many NIH-supported generalist repositories are available as alternatives. You should work with your program officials to determine the best repository for your data.

I’m preparing a data sharing plan for an upcoming NHLBI proposal submission. Does data submission and storage have any associated costs or fees that I need to budget for?

Last updated

Was this helpful?