# Data Organization in PIC-SURE

PIC-SURE integrates clinical and genomic datasets across BDC, including TOPMed and TOPMed related studies, COVID-19 studies, and BioLINCC studies. **Each variable is organized as a concept path that contains information about the study, variable group, and variable.** Though the specifics of the concept paths are dependent on the type of study, the overall information included is the same.

For more information about additional dbGaP, TOPMed, and PIC-SURE concept paths, refer to Appendix 1.

Table of Data Fields in PIC-SURE

| General organization   | <p>Data organized using the format implemented by the <a href="https://www.ncbi.nlm.nih.gov/gap/">database of Genotypes and Phenotypes (dbGaP)</a>. Find more information on the dbGaP data structure <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2031016/">here</a>.</p><p> </p><p>Generally, a given study will have several tables, and those tables have several variables.</p> | <p>Data do not follow dbGaP format; there are no phv or pht accessions.</p><p> </p><p>Data are organized in groups of like variables, when available. For example, variables like <em>Age, Gender,</em> and <em>Race</em> could be part of the <em>Demographics</em> variable group.</p> |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Concept path structure | \phs\pht\phv\variable name\\                                                                                                                                                                                                                                                                                                                                                                  | \phs\variable name                                                                                                                                                                                                                                                                       |
| Variable ID            | phv corresponding to the variable accession number                                                                                                                                                                                                                                                                                                                                            | Equivalent to variable name                                                                                                                                                                                                                                                              |
| Variable name          | Encoded variable name that was used by the original submitters of the data                                                                                                                                                                                                                                                                                                                    | Encoded variable name that was used by the original submitters of the data                                                                                                                                                                                                               |
| Variable description   | Description of the variable                                                                                                                                                                                                                                                                                                                                                                   | Description of the variable, as available                                                                                                                                                                                                                                                |
| Dataset ID             | pht corresponding to the trait table accession number                                                                                                                                                                                                                                                                                                                                         | Equivalent to dataset name                                                                                                                                                                                                                                                               |
| Dataset name           | Name of the trait table                                                                                                                                                                                                                                                                                                                                                                       | Name of a group of like variables, as available                                                                                                                                                                                                                                          |
| Dataset description    | Description of the trait table                                                                                                                                                                                                                                                                                                                                                                | Description of a group of like variables, as available                                                                                                                                                                                                                                   |
| Study ID               | phs corresponding to the study accession number                                                                                                                                                                                                                                                                                                                                               | phs corresponding to the study accession number                                                                                                                                                                                                                                          |
| Study description      | Description of the study from dbGaP                                                                                                                                                                                                                                                                                                                                                           | Description of the study from dbGaP                                                                                                                                                                                                                                                      |

&#x20;Note that there are two data types in PIC-SURE: categorical and continuous data. Categorical variables refers to any variables that have categorized values. For example, “Have you ever had asthma?” with values “Yes” and “No” is a categorical variable. Continuous variables refer to any variables that have a numeric range of values. For example, “Age” with a value range from 10 to 90 is a continuous variable. The internal PIC-SURE data load process determines the type of each variable based on the data.
