LogoLogo
  • NHLBI BioData Catalyst® (BDC) Documentation
  • Community
    • Who We Are
    • BDC Glossary
    • Citation and Acknowledgement
    • Strategic Planning
    • Request for Comments
      • NHLBI BioData Catalyst Ecosystem Security Statement
      • NHLBI DICOM Medical Image De-Identification Baseline Protocol
    • BDC Video Content Guidance
    • Contributing User Resources to BDC
  • Written Documentation
    • Getting Started
    • Data Access
      • Data Interoperability
      • Understanding Access
      • Submitting a dbGaP Data Access Request
      • Checking Access
    • Explore Available Data
      • Dug Semantic Search
        • Search and Results
      • PIC-SURE User Guide
        • Getting Started
          • Requirements and Login
          • Available Data and Managing Data Access
            • TOPMed and TOPMed related datasets
            • BioLINCC Datasets
            • CONNECTS Dataset
        • Data Organization in PIC-SURE
        • PIC-SURE Features and General Layout
        • PIC-SURE Open Access vs. PIC-SURE Authorized Access
          • PIC-SURE Open Access
          • PIC-SURE Authorized Access
        • Data Analysis Using the PIC-SURE API
        • Additional Resources
        • PIC-SURE API Documentation
        • Appendix 1: BioData Catalyst Identifiers - dbGaP, TOPMed, and PIC-SURE
        • Appendix 2: Table of Harmonized Variables
      • Discovering Data Using Gen3
        • Dictionary
        • Exploration
        • Query
        • Workspace
        • Profile
        • PFB Files
        • Current Projects
    • Analyze Data
      • Transferring Files Between Seven Bridges and Terra
      • Seven Bridges
        • Knowledge Center
        • Getting Started Guide
        • Comprehensive Analysis Tips
        • Troubleshooting Tasks
        • GWAS with GENESIS workflows
        • Annotation Explorer
      • Terra
        • Account Setup
          • Billing
          • Managing Costs
        • Workspace Setup
          • Data Storage & Management
          • Collaboration
          • Security
        • Bring Data into a Workspace
          • Bring in Data from Gen3
          • From Terra’s Data Library
          • Use Your Own Data with Terra
        • Run Analyses
          • Batch Processing with Workflows
          • Interactive Analysis
          • Genome-Wide Association Studies
        • Troubleshooting & Support
      • Dockstore
        • Launch workflows with BioData Catalyst
        • Discover our catalog
        • Intro to Docker, WDL, CWL
        • Dockstore Forum
        • Contribute to the community
    • Community Tools & Integration
      • Bring Your Own Tool(s)
        • BYOT Glossary
        • Working with Docker
        • Creating, testing & scaling WDL workflows
        • Creating, testing & scaling CWL workflows
        • Version Control, Publishing & Validation of Workflows
        • Advanced Topics
      • Import a Dockstore App With Seven Bridges
    • Writing BDC into a Grant Proposal
    • Incurring Cloud Costs
    • Release Notes
      • 2025-04-15 BDC Release Notes
      • 2025-01-15 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-10-21 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-07-02 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-04-01 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2024-01-08 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-10-04 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-07-11 NHLBI BioData Catalyst Ecosystem Release Notes
      • 2023-04-04 BioData Catalyst Ecosystem Release Notes
      • 2023-01-09 BioData Catalyst Ecosystem Release Notes
      • 2022-10-03 BioData Catalyst Ecosystem Release Notes
      • 2022-07-11 BioData Catalyst Ecosystem Release Notes
      • 2022-04-04 BioData Catalyst Ecosystem Release Notes
      • 2022-01-24 BioData Catalyst Ecosystem Release Notes
      • 2021-10-04 BioData Catalyst Ecosystem Release Notes
      • 2021-07-09 BioData Catalyst Ecosystem Release Notes
      • 2021-04-02 BioData Catalyst Ecosystem Release Notes
      • 2021-01-15 BioData Catalyst Ecosystem Release Notes
      • 2020-10-23 BioData Catalyst Ecosystem Release Notes
      • 2020-08-24 BioData Catalyst Ecosystem Release Notes
      • 2020-04-02 BioData Catalyst Ecosystem Release Notes
    • Data Versioning Release Notes
    • NIH RECOVER Release Notes
  • Tutorials: Videos & Modules
    • Seven Bridges Tutorials
      • Genetic Association Testing using GENESIS Workflows
      • Estimating and Managing Your Cloud Costs
    • Terra Tutorials
      • Getting Started with Gen3 Data on Terra Tutorial
      • Genome Wide Association Study with 1000 Genomes Data Tutorial
      • Genome Wide Association Study with TOPMed Data Tutorial
      • TOPMed Aligner, or, How to Import Data From Gen3 into Terra and Run a Workflow on It
  • Data Management
    • Data Management Strategy
    • Instructions for Data Submission to BDC
      • De-identification Readme
      • Data Dictionary Requirement
    • dbGaP Study Configuration Process for Submission of Data to BDC
Powered by GitBook
On this page
  • Using Exploration
  • Data Accessibility
  • The Data Tab
  • Exporting Data from the Data Tab
  • The Files Tab
  • Exporting/Downloading Data from the Files Tab
  • File Information Page
  • Free text search for Submitter IDs and File Names

Was this helpful?

Export as PDF
  1. Written Documentation
  2. Explore Available Data
  3. Discovering Data Using Gen3

Exploration

An explanation for the Exploration page on BioData Catalyst Powered by Gen3

PreviousDictionaryNextQuery

Last updated 4 years ago

Was this helpful?

Using Exploration

The Exploration page located in the upper right-hand section of the toolbar allows users to search through data and create cohorts. The Exploration portal contains a dynamic summary statistics display, as well as search facets leveraging the DCC Harmonized Variables.

Data Accessibility

Users can navigate through data on the Exploration page by selecting any of the three Data Access categories.

  • Data with Access: A user can view all of the summary data and associated study information for studies the user has access to, including but not limited to Project ID, file types, and clinical variables.

  • Data without Access:

    • Projects will also be hidden if the select cohort contains fewer than 50 subjects (50 ↓, "You may only view summary information for this project", example below); in this case grayed out boxes and locks both appear. An additional lock means users have no access.

  • All Data: Users can view all of the data available in the BioData Catalyst Gen3 platform, including studies with and without access. As a result, studies not available to a user will be locked as demonstrated below.

By default, all users visiting the Exploration page will be assigned to Data with Access.

The Data Tab

  • Project: Any specifically defined piece of work that is undertaken or attempted to meet a single investigative question or requirement.

  • Subject: The collection of all data related to a specific subject in the context of a specific experiment.

  • Harmonized Variables: A selection of different clinical properties from multiple nodes, defined by the Consortium.

NOTE: The facet filters are based on the DCC Harmonized Variables, which are a selected subset of clinical data that have been transformed for compatibility across the dbGaP studies. TOPMed studies that do not contain harmonized clinical data at this time will be filtered out when a facet is chosen, unless the no data option is also selected for certain facets.

Exporting Data from the Data Tab

After a cohort has been selected, the user has four different options for exporting the data.

Export

The options for export are as follows:

  • Export to Workspaces : Export a manifest to the user's workspace and make the case-associated data files available in the workspace under the /pd/data directory.

NOTE: PFB export times can take up to 60 minutes, but often will complete in less than 10 minutes.

The Files Tab

The Files tab displays study files from the facets chosen on the left-side panel (Project ID, Data Type, Data Format, Callset, and Bucket Path). Each time a facet selection is made, the data summary and displays will update to reflect the applied filters.

Locating Unharmonized Clinical Data

The Files tab also contains files that are either case-independent or project-level. This is important for files that are part of the Unharmonized Clinical Data category under the Data Type field. Unharmonized clinical files are made available in two distinct data formats:

  • TAR : Contain a complete directory of phenotypic datasets as XML and TXT files that are direct downloads of unharmonized clinical data from dbGaP on a study consent level project.

  • XML: These files contain either dictionary or variable reports of the phenotypic datasets that are in the TXT files. These supporting files do contain information on a study-level and not on a subject-level.

  • TXT: These files contain subject-level phenotypic datasets.

NOTE: The unharmonized clinical data sets contains all data from the dbGaP study, but it is not cross-compatible across all studies within BioData Catalyst.

Exporting/Downloading Data from the Files Tab

Once the user has selected a cohort, there are five options for accessing the files:

  • Export to Workspace: The files can be exported to a Gen3 workspace.

  • GUID Download File Page: Aside from the 5 button options, users can download files by first clicking on the link(s) under the GUIDs column, followed by the Download button in the file information pages (see next section below).

File Information Page

Free text search for Submitter IDs and File Names

Both the Data and File tabs contain a text-based search function that will initiate a list of suggestions below the search bar while typing.

In the Data tab, Submitter IDs can be searched under the Subject tab.

In the File tab, File Names can be searched under the File tab.

Click either on a single or on multiple suggestions in the list appearing underneath the search bar to create a cohort and export/download the data. The selections can be again clicked to be removed from the created cohort.

Locks next to the project ID signify to users that they do not have subject-level access but they can still search through the available studies but only view summary statistics. Users can request access to data by visiting the .

Under the "Data" tab, users can leverage the to create custom cohorts. When facets are selected and/or updated to cover a desired range of values, the display will reflect the information relevant to the new applied filter. If no facets have been selected, all of the data accessible to the user will be displayed. At this time, a user can filter based on three categories of clinical information:

Export All to Terra : Initiate a export of all clinical data and file GUIDs for the selected cohort to . At this time the max number of subjects that can be exported to Terra is 120,000.

Export All to Seven Bridges: Initiate a export of all clinical data and file GUIDs for the selected cohort to

Export to PFB : Initiate a export of all clinical data and file GUIDs for the selected cohort to your local storage.

AVRO: These files are the same as the unharmonized clinical data from dbGaP as the TAR files, but in form of a file.

Download Manifest: Download the file manifest and use this manifest to download the enlisted data files using the .

Export All PFB: Initiate a export of the selected files.

Export All to Terra: Initiate a export of the selected files to .

Export All to Seven Bridges: Initiate a export of the selected files to

A user can visit the File Information Page after clicking on any of the available GUID link(s) in the Files tab page. The page will display details such as data format, size, object_id, the last time it was updated and the md5sum. The page also contains a button to download the file via the browser (see below). For files that are 5GB or more, we suggest using the .

dbGaP homepage
DCC harmonized variables
PFB
gen3-client
PFB
PFB
BioData Catalyst powered by Terra
PFB
BioData Catalyst powered by Seven Bridges.
gen3-client
BioData Catalyst powered by Terra
BioData Catalyst powered by Seven Bridges.
Data Access panel on the Exploration page.
The view on the list of Projects when "Data without Access" is selected.
Example: The variable of Ethnicity is hidden once the number of subjects falls below 50.
Lock, grayed out box and "50" signify the number of subjects falls <50 and users have no access..
Exploration page with Data Access displaying the Data with Access.
Four options offered for data export.
The Files Tab page.
Five button options offered for file download or export.
Download files by clicking on the link located under the GUID column.
An example file information page with the Download button.
Free text search of Submitter IDs in Subject on the Data Tab.
Free text search of File Names on the File Tab.
Select multiple suggestions to create an exportable cohort.
Portable Format for Bioinformatics (PFB)
Portable Format for Bioinformatics (PFB)
PFB