> For the complete documentation index, see [llms.txt](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/data-management/data-submission-instructions/data-submission-faqs.md).

# Data Submission FAQs

**Why do I need to submit to dbGaP when I don’t have genetic data?**

* **Answer:** Historically, the BDC registration and ingestion mechanisms were developed to support omics data for TOPMed.  As BDC evolved to include clinical and other non-genomic data types, dbGaP registration continued to be a mechanism of registering and supporting authorization of controlled data access. In collaboration with dbGaP, BDC was able to support a registration process that allowed for us to continue leveraging the data access management and request mechanisms that we have been using, but with the data being submitted to BDC rather than to dbGaP.

**Do I need to upload my data to both dbGaP and BDC?**&#x20;

* **Answer:** As stated in the [Instructions for Data Submission to BDC](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbdcatalyst.gitbook.io%2Fbiodata-catalyst-documentation%2Fdata-management%2Fdata-submission-instructions\&data=05%7C01%7Chpan%40rti.org%7Ca5db6473bd74473b05c108dbeabbf5a1%7C2ffc2ede4d4449948082487341fa43fb%7C0%7C0%7C638361865569904292%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C\&sdata=lBOSWCt1YYm3BBYBFaU08oXK3ZOgs5yQYU87HfP5Kx0%3D\&reserved=0), study “data files” are uploaded to BDC only (step 4), and “study level metadata and subject consent files” are uploaded to dbGaP (step 2).&#x20;

**Where is the dbGaP submission link?**&#x20;

* **Answer:** The Submission Portal (SP) link is <https://submit.ncbi.nlm.nih.gov/dbgap/>. &#x20;

**During dbGaP submission, “Sample Attributes” (6a/6b) is a required field, but I don’t have samples.**

* **Answer:** If your study doesn’t have samples, please use a dummy blank file&#x20;

**Will the ‘record IDs’ from datasets be masked/transformed by dbGAP team before publishing?**

* **Answer:** Each subject should be submitted with a single, unique, de-identified subject ID. dbGaP will assume the ‘record IDs’ are de-identified and will not mask/transform before publishing. If this study dataset is collected from a cohort that have existing dbGaP study (parent study or sub-studies), dbGaP will curate that the IDs are the same, or will need the linkage file to link the IDs.&#x20;

**Is the data we uploaded to BioData Catalyst available now for researchers to request? We have a manuscript using the data which was recently accepted for publication, and we wanted to include a link in our manuscript to the dataset.**&#x20;

* **Answer:** The data uploaded will go through the BDC ingestion process, which takes as little as 4-6 weeks to release.&#x20;

**What are the data types that are currently acceptable to BDC?**

* **Answer:** BioData Catalyst can ingest data types of all types and sizes, including, but not limited to genomic and proteomic to clinical/phenotypic and imaging data.&#x20;

**How does BDC want the dates represented in the datafiles?  Can we date shift or should they be converted to days from an index date?**

* **Answer:** BDC will accept any de-identification methods for dates, as long as it’s been documented for others to understand and reuse the data. Please reference the [Instructions for Preparing Clinical Research Study Datasets for Submission to the NHLBI](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.nhlbi.nih.gov%2Fgrants-and-training%2Fpolicies-and-guidelines%2Fguidelines-for-preparing-clinical-study-data-sets-for-submission-to-the-nhlbi-data-repository\&data=05%7C01%7Chpan%40rti.org%7C478891bf497c478cc8a808dbf2b9b0b9%7C2ffc2ede4d4449948082487341fa43fb%7C0%7C0%7C638370652830250448%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C\&sdata=wV429epLoIyZ2VMYDVASvr1Mf5QHFKjA1fkLE3lRU78%3D\&reserved=0).&#x20;

**How do we start the data upload? (steps)**

* **Answer:** Please reference the [Instructions for Data Submission to BDC](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbdcatalyst.gitbook.io%2Fbiodata-catalyst-documentation%2Fdata-management%2Fdata-submission-instructions\&data=05%7C01%7Chpan%40rti.org%7Ca5db6473bd74473b05c108dbeabbf5a1%7C2ffc2ede4d4449948082487341fa43fb%7C0%7C0%7C638361865569904292%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C\&sdata=lBOSWCt1YYm3BBYBFaU08oXK3ZOgs5yQYU87HfP5Kx0%3D\&reserved=0) document for the information.&#x20;

**What data types does BioData Catalyst accept? Is it a suitable repository for my proposed   multiple data types (including imaging) to include in my DMS Plan?**

* **Answer:** BioData Catalyst (BDC) is flexible to accept most data types related to human data, including imaging data. For non-human data, many NIH-supported generalist repositories are available as alternatives. You should work with your program officials to determine the best repository for your data.

**I’m preparing a data sharing plan for an upcoming NHLBI proposal submission. Does data submission and storage have any associated costs or fees that I need to budget for?**

* **Answer:** [NHLBI’s BioData Catalyst](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbiodatacatalyst.nhlbi.nih.gov%2F\&data=05%7C02%7Cnhlbi.dmc.concierge%40rti.org%7Cb97123c853964044c0d308dc4a961d92%7C2ffc2ede4d4449948082487341fa43fb%7C0%7C0%7C638467255809940180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C\&sdata=qgkmg2NvRCRiIlSmiv%2FvybElLlpMRpUYnOArWqHmNUQ%3D\&reserved=0), is envisioned as the central repository for NHLBI supported studies , so for your DMS plan, it would be good to reference BDC as the repository for your data. NHLBI covers the cost of storing data submitted to BDC for broad research reuse (“hosted data”). If you plan to use BDC to analyze or prepare your data for sharing, you will incur computational costs which vary depending on the size of your data and the scale of your compute. More information on cloud costs is available at: [https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/tutorials-videos-and-modules/seven-bridges-tutorials/estimating-and-managing-your-cloud-costs.](https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/tutorials-videos-and-modules/seven-bridges-tutorials/estimating-and-managing-your-cloud-costs)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://bdcatalyst.gitbook.io/biodata-catalyst-documentation/data-management/data-submission-instructions/data-submission-faqs.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
