LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Table of BioData Catalyst dbGAP/TOPMed Identifiers
  • Table of PIC-SURE Identifiers

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Explore Available Data
  3. PIC-SURE User Guide

Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE

Table of BioData Catalyst dbGAP/TOPMed Identifiers

Patient ID

This is the HPDS Patient num. This is PIC-SURE HPDS’s internal Identifier.

Topmed / Parent Study Accession with Subject ID

  • These are the identifiers used by each in the team in the consortium to link data.

  • Values must follow this mask <STUDY_ACCESSION_NUMBER>.<VERSION>_<SUBJECT_ID> Eg: phs000007.v30_XXXXXXX

DBGAP_SUBJECT_ID

  • This is a generated id that is unique to each patient in a study.

  • Controlled by dbgap

  • It is not unique across unrelated studies. However Patients can be linked across studies. See SOURCE_SUBJECT_ID.

  • However a patient will be assigned the same across related studies. For dbGaP to assign the same dbGaP subject ID, include the two variables, SUBJECT_SOURCE and SOURCE_SUBJECT_ID.

  • This identifier is used in all the phenotypic data files and is what we sequence to a HPDS Patient Num ( Patient ID ). All sequenced identifiers are stored in a PatientMapping file and stored in s3. These mappings allow HPDS data to be correlated back to the raw data sets.

SUBJECT_ID

  • This is a generated id that is unique to each patient in a study.

  • Controlled by the submitter of a study.

  • For FHS this is replaced with shareid for phs000007. For phs000974 It uses SUBJECT_ID. The values for these two columns are the same however.

SHARE_ID

  • For FHS phs000007 this was used instead of SUBJECT_ID, but not for FHS phs000974

SOURCE_SUBJECT_ID

  • This is used internally by DBGAP in conjunction with SUBJECT_SOURCE to allow submitters to associate subjects across studies.

SAMPLE_ID

  • De-identified sample identifier.

  • These are the ids that link to the molecular data in dbgap ( vcfs, etc.).

Table of PIC-SURE Identifiers

\_Topmed Study Accession with Subject ID\

Generated identifier for TOPMed Studies. These identifiers are a concatenation using the accession name and “SUBJECT_ID” from a study’s subject multi file.

<STUDY_ACCESSION_NUMBER>.<VERSION>_<SUBJECT_ID>

Eg: phs000974.v3_XXXXXXX

\_Parent Study Accession with Subject ID\

Generated identifier for PARENT Studies. In most studies this follows the same pattern as the TOPMed Study Accession with Subject id.

However, Framingham’s parent study phs000007 does not contain SUBJECT_ID column which is replaced using the SHAREID column.

Eg: phs000007.v3_XXXXXXX

\_VCF Sample Id\

This variable is stored in the sample multi file in each dbGaP study.

This is the TOPMed DNA sample identifier. This is used to give each sample/sequence a unique identifier across TOPMed studies.

Eg: NWD123456

Patient ID (not a concept path but exists in data exports)

This is PIC-SURE’s internal Identifier. It is commonly referred to as HPDS Patient num.

This identifier is generated and assigned to subjects when they are loaded. It is not meant for data correlation between different data sources.

PreviousPIC-SURE API DocumentationNextAppendix 2: Table of Harmonized Variables

Last updated 2 years ago

Was this helpful?