Exploration
An explanation for the Exploration page on BDC-Gen3
Using Exploration
The Exploration page located in the upper right-hand section of the toolbar allows users to search through data and create cohorts. The Exploration portal contains a dynamic summary statistics display, as well as search facets leveraging the DCC Harmonized Variables.
Data Accessibility
Users can navigate through data on the Exploration page by selecting any of the three Data Access categories.
Data with Access: A user can view all of the summary data and associated study information for studies the user has access to, including but not limited to Project ID, file types, and clinical variables.
Data without Access:
Locks next to the project ID signify to users that they do not have subject-level access but they can still search through the available studies but only view summary statistics. Users can request access to data by visiting the
All Data: Users can view all of the data available in the BDC-Gen3 platform, including studies with and without access. As a result, studies not available to a user will be locked as demonstrated below.
By default, all users visiting the Exploration page will be assigned to Data with Access.
Under the "Data" tab, users can leverage the to create custom cohorts. When facets are selected and/or updated to cover a desired range of values, the display will reflect the information relevant to the new applied filter. If no facets have been selected, all of the data accessible to the user will be displayed. At this time, a user can filter based on three categories of clinical information:
Project: Any specifically defined piece of work that is undertaken or attempted to meet a single investigative question or requirement.
Subject: The collection of all data related to a specific subject in the context of a specific experiment.
Harmonized Variables: A selection of different clinical properties from multiple nodes, defined by the Consortium.
NOTE: The facet filters are based on the DCC Harmonized Variables, which are a selected subset of clinical data that have been transformed for compatibility across the dbGaP studies. TOPMed studies that do not contain harmonized clinical data at this time will be filtered out when a facet is chosen, unless the no data option is also selected for certain facets.
Exporting Data from the Data Tab
After a cohort has been selected, the user has four different options for exporting the data.
The options for export are as follows:
Export All to Terra : Initiate a export of all clinical data and file GUIDs for the selected cohort to . At this time the max number of subjects that can be exported to Terra is 120,000.
Export All to Seven Bridges: Initiate a export of all clinical data and file GUIDs for the selected cohort to
NOTE: PFB export times can take up to 60 minutes, but often will complete in less than 10 minutes.
The Files tab displays study files from the facets chosen on the left-side panel (Project ID, Data Type, Data Format, Callset, and Bucket Path). Each time a facet selection is made, the data summary and displays will update to reflect the applied filters.
Locating Unharmonized Clinical Data
The Files tab also contains files that are either case-independent or project-level. This is important for files that are part of the Unharmonized Clinical Data category under the Data Type field. Unharmonized clinical files are made available in two distinct data formats:
TAR : Contain a complete directory of phenotypic datasets as XML and TXT files that are direct downloads of unharmonized clinical data from dbGaP on a study consent level project.
AVRO: These files are the same as the unharmonized clinical data from dbGaP as the TAR files, but in form of a file.
NOTE: The unharmonized clinical data sets contains all data from the dbGaP study, but it is not cross-compatible across all studies within BDC.
Exporting/Downloading Data from the Files Tab
Once the user has selected a cohort, there are five options for accessing the files:
Download Manifest: Download the file manifest and use this manifest to download the enlisted data files using the .
Export to Workspace: The files can be exported to a Gen3 workspace.
File Information Page
A user can visit the File Information Page after clicking on any of the available GUID link(s) in the Files tab page. The page will display details such as data format, size, object_id, the last time it was updated and the md5sum. The page also contains a button to download the file via the browser (see below). For files that are 5GB or more, we suggest using the .
Free text search for Submitter IDs and File Names
Both the Data and File tabs contain a text-based search function that will initiate a list of suggestions below the search bar while typing.
In the Data tab, Submitter IDs can be searched under the Subject tab.
In the File tab, File Names can be searched under the File tab.
Click either on a single or on multiple suggestions in the list appearing underneath the search bar to create a cohort and export/download the data. The selections can be again clicked to be removed from the created cohort.