arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Use Your Own Data with Terra

This page describes how researchers may bring their own data files and metadata into Terra. Some researchers may choose to bring their own data to Terra in addition to - or instead of - using BDC data from Gen3. For example, this may be done when bringing additional (e.g., longitudinal) phenotypic data to enhance the harmonized metadata available from Gen3, or when using Joint variant calling with additional researcher provided genomic data, or even using researcher provided data exclusively,

Generally, there are two types of data that researchers typically bring to Terra. Data files (e.g., genomic data, including CRAM and VCF data), and metadata (e.g., tables of clinical/phenotypic or other data, typically regarding the subjects in their study). These are described separately below.

There are two ways a researcher's data files may be made available in Terra: By uploading data to the researcher's workspace bucket or enabling Terra to access the researcher's data in a researcher managed Google bucket, for which you need to set up a proxy group.

Article: Uploading to a workspace Google bucketarrow-up-right​ Article: Understanding and setting up a proxy grouparrow-up-right

The ways in which a researcher may import metadata to the Terra Data tables are described in the​ articles and tutorials below:

Article: ​​ Article: ​VIdeo: Video:

Bring in Data from Gen3

provides data for many projects and conveniently supports search across the vast set of subjects to identify the best available cohorts for research analysis. Searches are based on harmonized phenotypic variables and may be performed both within and across projects.

When a desired cohort has been identified in Gen3, the cohort may be conveniently "handed-off" to Terra for analysis. Optionally, this dataset may be enhanced with additional metadata from dbGaP, or extended to include additional researcher-provided subject data.

Here we provide essential information for all researchers using BDC data from Gen3, including how to access and select Gen3 subject data and hand it off to Terra, as well as a description of the GA4GH Data Repository Service (DRS) protocol and data identifiers used by Gen3 and Terra.

The resources below contain the information you’ll need to access your desired data: Video: Article: ​ Article: ​ Article: ​ Article: ​Article: ​ Workspace:

Managing data with tablesarrow-up-right
How to import metadata to a workspace data table arrow-up-right
Introduction to Terra data tables arrow-up-right
Making and uploading data tables to Terraarrow-up-right
Workspace:
BioData Catalyst Powered by Gen3arrow-up-right
Data Analysis with Gen3, Terra and Dockstorearrow-up-right
Discovering Data Using Gen3arrow-up-right
Understanding and using Gen3 data in Terraarrow-up-right
Data Access with the GA4GH Data Repository Service (DRS)arrow-up-right
Linking Terra to External Servers arrow-up-right
Understanding and setting up a proxy grouparrow-up-right
BioDataCatalyst Gen3 data on Terra tutorialarrow-up-right
TOPMed Aligner workspacearrow-up-right

From Terra’s Data Library

Terra’s includes a number of integrated datasets, many of which have individualized Data Explorer interfaces, useful for generating and exporting custom cohorts. If you click into a dataset and have the proper permissions, you'll be able to explore the data. If you don't have the necessary permission, you'll be taken to a page that tells you whom to contact for access.

​The resources linked below provide guided instructions for creating custom cohorts from the data library and importing them to your workspace, and using a Jupyter notebook to interact with the data: Article: Video: Workspace:

Dataset Libraryarrow-up-right
Accessing and analysing custom cohorts with Data Explorer arrow-up-right
Notebooks Quickstart walkthrougharrow-up-right
Notebooks Quickstart workspacearrow-up-right

Bring Data into a Workspace

You can import data into your workspace by either linking directly to external files you have access to, or by interfacing with a number of platforms with which Terra has integrated access.

For BDC researchers, one of the most relevant of these interfacing platforms is Gen3arrow-up-right. However this section also provides you with resources that teach how to import data from other public datasets integrated into Terra’s data library, as well as how to bring in your own data.

Read on in this section for more information on:

  • Bringing in data from Gen3arrow-up-right

Bringing in data from Terra's Data Libraryarrow-up-right
Using your own data with Terraarrow-up-right
Understanding the Terra ecosystem and how your files live in itarrow-up-right