You can import data into your workspace by either linking directly to external files you have access to, or by interfacing with a number of platforms with which Terra has integrated access.
For BioData Catalyst researchers, one of the most relevant of these interfacing platforms is Gen3. However this section also provides you with resources that teach how to import data from other public datasets integrated into Terra’s data library, as well as how to bring in your own data.
Read on in this section for more information on:
This page describes how researchers may bring their own data files and metadata into Terra. Some researchers may choose to bring their own data to Terra in addition to - or instead of - using BioData Catalyst data from Gen3. For example, this may be done when bringing additional (e.g., longitudinal) phenotypic data to enhance the harmonized metadata available from Gen3, or when using Joint variant calling with additional researcher provided genomic data, or even using researcher provided data exclusively,
Generally, there are two types of data that researchers typically bring to Terra. Data files (e.g., genomic data, including CRAM and VCF data), and metadata (e.g., tables of clinical/phenotypic or other data, typically regarding the subjects in their study). These are described separately below.
There are two ways a researcher's data files may be made available in Terra: By uploading data to the researcher's workspace bucket or enabling Terra to access the researcher's data in a researcher managed Google bucket, for which you need to set up a proxy group.
Article: Uploading to a workspace Google bucket Article: Understanding and setting up a proxy group
The ways in which a researcher may import metadata to the Terra Data tables are described in the articles and tutorials below:
Article: Managing data with tables Article: How to import metadata to a workspace data table VIdeo: Introduction to Terra data tables Video: Making and uploading data tables to Terra
Terra’s Dataset Library includes a number of integrated datasets, many of which have individualized Data Explorer interfaces, useful for generating and exporting custom cohorts. If you click into a dataset and have the proper permissions, you'll be able to explore the data. If you don't have the necessary permission, you'll be taken to a page that tells you whom to contact for access.
The resources linked below provide guided instructions for creating custom cohorts from the data library and importing them to your workspace, and using a Jupyter notebook to interact with the data: Article: Accessing and analysing custom cohorts with Data Explorer Video: Notebooks Quickstart walkthrough Workspace: Notebooks Quickstart workspace
provides data for many projects and conveniently supports search across the vast set of subjects to identify the best available cohorts for research analysis. Searches are based on harmonized phenotypic variables and may be performed both within and across projects.
When a desired cohort has been identified in Gen3, the cohort may be conveniently "handed-off" to Terra for analysis. Optionally, this dataset may be enhanced with additional metadata from dbGaP, or extended to include additional researcher-provided subject data.
Here we provide essential information for all researchers using BioData Catalyst data from Gen3, including how to access and select Gen3 subject data and hand it off to Terra, as well as a description of the GA4GH Data Repository Service (DRS) protocol and data identifiers used by Gen3 and Terra.
The resources below contain the information you’ll need to access your desired data: Video: Article: Article: Article: Article: Article: Workspace: Workspace: