# Appendix 1: BDC Identifiers - dbGaP, TOPMed, and PIC-SURE

## Table of BDC dbGAP/TOPMed Identifiers

| Patient ID                                      | This is the HPDS Patient num. This is PIC-SURE HPDS’s internal Identifier.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Topmed / Parent Study Accession with Subject ID | <ul><li>These are the identifiers used by each in the team in the consortium to link data.</li><li>Values must follow this mask<br>\<STUDY\_ACCESSION\_NUMBER>.\<VERSION>\_\<SUBJECT\_ID><br>Eg: phs000007.v30\_XXXXXXX</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| DBGAP\_SUBJECT\_ID                              | <ul><li>This is a generated id that is unique to each patient in a study.</li><li>Controlled by dbgap</li><li>It is not unique across unrelated studies. However Patients can be linked across studies. See SOURCE\_SUBJECT\_ID.</li><li>However a patient will be assigned the same across related studies. For dbGaP to assign the same dbGaP subject ID, include the two variables, SUBJECT\_SOURCE and SOURCE\_SUBJECT\_ID.</li><li>This identifier is used in all the phenotypic data files and is what we sequence to a HPDS Patient Num ( Patient ID ). All sequenced identifiers are stored in a PatientMapping file and stored in s3. These mappings allow HPDS data to be correlated back to the raw data sets.</li></ul> |
| SUBJECT\_ID                                     | <ul><li>This is a generated id that is unique to each patient in a study.</li><li>Controlled by the submitter of a study.</li><li>For FHS this is replaced with shareid for phs000007. For phs000974 It uses SUBJECT\_ID. The values for these two columns are the same however.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| SHARE\_ID                                       | <ul><li>For FHS phs000007 this was used instead of SUBJECT\_ID, but not for FHS phs000974</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| SOURCE\_SUBJECT\_ID                             | <ul><li>This is used internally by DBGAP in conjunction with SUBJECT\_SOURCE to allow submitters to associate subjects across studies.</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| SAMPLE\_ID                                      | <ul><li>De-identified sample identifier.</li><li>These are the ids that link to the molecular data in dbgap ( vcfs, etc.).</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |

## Table of PIC-SURE Identifiers

| \\\_Topmed Study Accession with Subject ID\\               | <p>Generated identifier for TOPMed Studies. These identifiers are a concatenation using the accession name and “SUBJECT\_ID” from a study’s subject multi file.</p><p> </p><p>\<STUDY\_ACCESSION\_NUMBER>.\<VERSION>\_\<SUBJECT\_ID></p><p>Eg: phs000974.v3\_XXXXXXX</p>                                                        |
| ---------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| \\\_Parent Study Accession with Subject ID\\               | <p>Generated identifier for PARENT Studies. In most studies this follows the same pattern as the TOPMed Study Accession with Subject id.</p><p> </p><p>However, Framingham’s parent study phs000007 does not contain SUBJECT\_ID column which is replaced using the SHAREID column.</p><p> </p><p>Eg: phs000007.v3\_XXXXXXX</p> |
| \\\_VCF Sample Id\\                                        | <p>This variable is stored in the sample multi file in each dbGaP study.</p><p> </p><p>This is the TOPMed DNA sample identifier. This is used to give each sample/sequence a unique identifier across TOPMed studies.</p><p> </p><p>Eg: NWD123456</p>                                                                           |
| Patient ID (not a concept path but exists in data exports) | <p>This is PIC-SURE’s internal Identifier. It is commonly referred to as HPDS Patient num.</p><p> </p><p>This identifier is generated and assigned to subjects when they are loaded. It is not meant for data correlation between different data sources.</p>                                                                   |
