LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Requirements
  • Data Access Request Process
  • Sample Research Use Statement
  • Title
  • Research Use Statement
  • Non-technical summary
  • Cloud-Use Statement
  • Cloud Provider Information

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Data Access

Submitting a dbGaP Data Access Request

PreviousUnderstanding AccessNextChecking Access

Last updated 2 years ago

Was this helpful?

Requirements

  • An NIH eRA Commons ID (or appropriate NIH Login) is required for submitting a Data Access Request (DAR). If you do not have an eRA Commons account, you must request one through your institution’s Office of Sponsored Research or equivalent. For more information, refer to .

  • To submit a DAR, users must have PI status through their institution. Non-PI users must have a PI they work with that can submit a DAR and add them as a downloader.

Data Access Request Process

Step 1: Go to to log in to dbGaP.

Step 2: Navigate to My Projects.

Step 3: Select Datasets.

You can search by Primary disease type, or if you know the dataset you are interested in, you can use Study lookup.

We want to request HCT for SCD, so we will use the accession number phs002385. As you type the accession number, the numbers will start to auto-populate.

Select the study to add it to the Data Access Request. You can request up to 200 studies that you are interested in accessing.

The user can add additional datasets as necessary needed to answer the research question.

Sample Research Use Statement

Title

Long-term survival and late death after hematopoietic cell transplant for sickle cell disease

Research Use Statement

Our project is limited to requested dataset. We have no plans to combine with other datasets.

In 2018, the National Heart Lung and Blood Institute (NHLBI) began work on BioData Catalyst, a shared virtual space where scientists can access NHLBI data and work with the digital objects needed for biomedical research (www.nhlbi.nih.gov/science/biodata-catalyst). This is a cloud-based platform that allows for tools, applications and workflows. It provides secure workspaces to share, store, cross-link and analyze large sets of data generated from biomedical research. Biodata Catalyst addresses the NHLBI Strategic Vision objective of leveraging emerging opportunities in data science to facilitate research in heart, lung, blood and sleep disorders. It offers specialized search functions, controlled access to data and analytic tools via programming interfaces and its interoperability will allow exchange of information with other components of the Data Commons. BioData Catalyst may be accessed by the biomedical researchers and the public at large. The first available datasets in BioData Catalyst include data from NHLBI’s Trans-Omics for Precision Medicine (TOPMed) Program and the Cure Sickle Cell Initiative. Rigor in designing and performing scientific research and the ability to reproduce biomedical research are two of the cornerstones of science advancement. In order to test reproducibility of biomedical data available in BioData Catalyst we accessed NHLBI data from the Cure Sickle Cell Initiative to test and validate the findings of a publication that utilized those data. That report, focused on the effect of donor type and transplant conditioning regime intensity on hematopoietic cell transplant outcomes for sickle cell disease. Hematopoietic cell transplant is potentially curative, yet this treatment is associated with risks for mortality from the treatment procedure. Published reports suggest the life expectancy of adults with sickle cell disease in the United States is shortened by at least two decades compared to the general population. Thus, a fundamental question that is often asked is whether hematopoietic cell transplant over time would offer a survival advantage compared to treatment with disease-modifying agents. In the report1 that examined for factors associated with survival after transplantation, young patients (aged =12 years) and patients who received their graft from an HLA-matched sibling had the highest survival. For those without an HLA-matched sibling the data did not favor one alternative donor type over another.1 The purpose of the current analyses is two-fold: 1) test and validate a publication that utilized data in the public domain and 2) the utility of these data to conduct an independent study. The aim of the later study was to estimate the conditional survival rates after hematopoietic cell transplantation stratified by time survived since transplantation and to compare all-cause mortality risks to those of an age, sex, and race-matched general population in the United States.

Non-technical summary

Investigators in the Cure Sickle Cell Initiative Data Consortium request access to data to examine rigor and reproducibility of data submitted - using a previous publication as reference. Additionally, we will calculate survival after hematopoietic cell transplant by time survived since transplant. For example, we will calculate the 5- and 10-year likelihood of being alive 2 and 5 years after transplantation.

Cloud-Use Statement

The NHLBI-supported BioData Catalyst (www.nhlbiBioDataCatalyst.org) is a cloud-based infrastructure where heart, lung, blood, and sleep (HLBS) researchers can go to find, search, access, share, cross-link, and compute on large scale datasets. It will provide tools, applications, and workflows to enable those capabilities in secure workspaces. The BioData Catalyst will employ Amazon Web Services and Google Cloud Platform for data storage and compute. BioData Catalyst comprises the Data Commons Framework Services (DCFS) hosted and operated by the University of Chicago. DCFS will provide the gold master data reference as well as authorization/authentication and indexing services. The DCFS will also enable security interoperability with the secure workspaces. Workspaces will be provided by FireCloud, hosted and operated by the Broad Institute, Fair4Cures, hosted and operated by Seven Bridges Genomics and PIC-SURE operated by Harvard Medical School. For the NHLBI BioData Catalyst, the NHLBI Designated Authorizing Official has recognized the Authority to Operate (ATO) issued to the Broad Institute, University of Chicago and Seven Bridges Genomics as presenting acceptable risk, and therefore the NCI ATO serves as an Interim Authority to Test (IATT) when used by designated TOPMed investigators and collaborators. Additionally, the NHLBI Designated Authorizing Official has recognized the Authority to Operate (ATO) for Harvard Medical School.

Cloud Provider Information

Cloud Provider:

NHLBI BioData Catalyst, Private, The NHLBI-supported BioData Catalyst (https://biodatacatalyst.nhlbi.nih.gov/) is a cloud-based infrastructure where heart, lung, blood, and sleep (HLBS) researchers can go to find, search, access, share, cross-link, and compute on large scale datasets. It will provide tools, applications, and workflows to enable those capabilities in secure workspaces.

The NHLBI BioData Catalyst will employ Amazon Web Services and Google Cloud Platform for data storage and compute. The NHLBI BioData Catalyst comprises the Data Commons Framework Services (DCFS) hosted and operated by the University of Chicago. DCFS will provide the gold master data reference as well as authorization/authentication and indexing services. The DCFS will also enable security interoperability with the secure workspaces. Workspaces will be provided by FireCloud, hosted and operated by the Broad Institute, Fair4Cures, hosted and operated by Seven Bridges Genomics and PIC-SURE operated by Harvard Medical School.

For the NHLBI BioData Catalyst, the NHLBI Designated Authorizing Official has recognized the Authority to Operate (ATO) issued to the Broad Institute, University of Chicago and Seven Bridges Genomics as presenting acceptable risk, and therefore the NCI ATO serves as an Interim Authority to Test (IATT) when used by designated TOPMed investigators and collaborators. Additionally, the NHLBI Designated Authorizing Official has recognized the Authority to Operate (ATO) for Harvard Medical School.

Google Cloud Platform, Commercial

Google Cloud Platform is a public cloud platform that provides solutions and services such as virtual machines, database instances, storage, and more. We will use the Google Compute, a service that provides resizable compute capacity, to allocate Machine Types in which we will develop the methods and infrastructure necessary to build the NHLBI BioData Catalyst. Google Cloud offers several storage options that work in conjunction with Compute Engine: Google Cloud Storage, and Google Compute Engine Persistent Disks. We expect to use each of these as the provide different capabilities including persistent storage and direct and networked storage for attaching to running machine instances. We will use networking technologies based on Google’s Andromeda architecture, which can create networking elements at any level with software. This software-defined networking allows Cloud Platform's services to implement networking features that fit their exact needs, such as secure firewalls for virtual machines in Google Compute Engine. We will use Google Cloud Identity & Access Management to control user access to these compute resources.

Amazon Web Services (AWS), Commercial Amazon Web services (AWS) is a public cloud platform that provides solutions and services such as virtual machines, database instances, storage, and more. We will use the Amazon Elastic Compute Cloud (Amazon EC2), a web service that provides resizable compute capacity, to allocate Amazon Machine Instances (AMIs) in which we will develop the methods and infrastructure necessary to build the NHLBI BioData Catalyst. AWS offers several storage options that work in conjunction with EC2: Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (EBS) and Amazon Elastic File System (Amazon EFS). We expect to use each of these as the provide different capabilities including persistent storage and direct and networked storage for attaching to running AMI(s). We will use the Amazon Virtual Private Cloud (VPC) to provide security and robust networking functionality to these compute resources and Amazon Identity and Access Management (IAM) to control user access to these compute resources. AWS offers extensive security and has written a white paper with guidelines for working with controlled access data sets in AWS which we will follow. (see ).

https://d0.awsstatic.com/whitepapers/compliance/AWS_dBGaP_Genomics_on_AWS_Best_Practices.pdf
Understanding eRA Commons Accounts
https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login