Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The 2024-01-08 release marks the 16th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., enabling Azure and searching data without logging in) along with documentation and tutorials (e.g., data dictionary field documentation) to help new users get started on the system. Please find more detail on the new features and user support materials in the sections below.
The 2024-01-08 data releases include the addition of research on multisystem inflammatory syndrome in children linked to COVID-19, bone marrow transplant and pulmonary hypertension in sickle cell disease, atherosclerosis, and psoriasis. Please refer to the Data Releases section below for more information as well as the Data page on the BDC website.
Azure available on BDC Powered by Seven Bridges (BDC-SB): Velsera expanded their existing multi-cloud offerings by enabling Microsoft Azure (southcentralus) on BDC-SB. Users can select that computing and storage environment when creating a project. This allows users to avoid any egress charges when computing on data stored in Azure. This is of particular interest to users who want to connect their own Azure cloud buckets to BDC-SB.
SAS upgrade in BDC-SB: SAS on BDC-SB has been upgraded from SAS Viya 3.5 to SAS Studio 9.4. SAS 9.4 has improved functionality over SAS 3.5 including more complete data management solutions and additional programming languages.
Open PIC-SURE without login: Open PIC-SURE is now publicly available on BDC Powered by PIC-SURE (BDC-PIC-SURE), meaning no eRA Commons credentials are required to access the site. Researchers can access this site to search terms of interest, apply filters at the variable-value level, retrieve obfuscated, aggregate counts, and view single variable distributions of their selected cohort. This new functionality allows researchers to discover and interact with data available on BDC without needing to log in, decreasing the barrier to data exploration. Check out Open PIC-SURE here.
Data Hierarchies in BDC-PIC-SURE: Researchers are now able to view the data hierarchy associated with variables in BDC-PIC-SURE by clicking the “Data Tree” icon in the “Actions” column of the search results. This enables researchers to understand better how variables are related and obtain additional context for these variables. Note that this feature is currently in beta and will only be available for some studies. Feedback and input on this feature is welcome!
BDC-PIC-SURE Data Dictionary fields documentation: Documentation outlining the data dictionary fields returned from the PIC-SURE API was created. This provides a detailed account of what each field represents, including relationships between fields. This documentation can be found in the BDC-PIC-SURE GitBook here.
The table below highlights which studies were included in the 2024-01-08 data release. The release features research on long-term outcomes of multisystem inflammatory syndrome in children linked to COVID-19 (COVID19-MUSIC_GRU), bone marrow transplant for severe sickle cell disease (BioLINCC-BMT_CTN_HMB), and ApoA-1, atherosclerosis, and psoriasis (DIR-ApoA-1_Atherosclerosis_in_Psoriasis_GRU). Additionally, updated metadata is provided for the ongoing study on sildenafil therapy in treating pulmonary hypertension in sickle cell disease (walk-PHaSST). This data includes clinical files and is now available for access. The data is now available for access across the entire ecosystem.
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
BDC-Gen3 release notes BDC-Terra release notes BDC-Seven Bridges release notes BDC-PIC-SURE release notes
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
Long-TerM OUtcomes after the Multisystem Inflammatory Syndrome In Children (MUSIC)
phs002770.v1.p1.c1
COVID19-MUSIC_GRU
Yes
Yes
Unrelated Donor Reduced Intensity Bone Marrow Transplant for Children with Severe Sickle Cell Disease (BMT CTN-0601-BioLINCC)
phs003470.v1.p1.c1
BioLINCC-BMT_CTN_HMB
Yes
Yes
ApoA-1 and Atherosclerosis in Psoriasis (DIR)
phs003231.v1.p1.c1
DIR-ApoA-1_Atherosclerosis_in_Psoriasis_GRU
Yes
Yes
Treatment of Pulmonary Hypertension and Sickle Cell Disease With Sildenafil Therapy (walk-PHaSST)
phs002383.v1.p1.c1
BioLINCC-Walk_PHaSST_DS-SCD-IRB-PUB-COL-NPU-MDS-RD
No
No
Genetic Epidemiology of COPD Study (COPDGene)
phs002910.v1.p1.c1
COVID19-C4R_COPDGene_HMB
Yes
Yes
Genetic Epidemiology of COPD Study (COPDGene)
phs002910.v1.p1.c2
COVID19-C4R_COPDGene_DS-CS
Yes
Yes
The Mediators of Atherosclerosis in South Asians Living in America (MASALA)
phs002980.v1.p1.c1
COVID19-C4R_MASALA_HMB-IRB-COL
Yes
Yes
Prevent Pulmonary Fibrosis (PrePF)
phs002975.v1.p1.c1
COVID19-C4R_PrePF_HMB
Yes
Yes
A Multi-site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults (RECOVER)
phs003463.v1.p1.c1
RECOVER-Adult
Yes
Yes
Hispanic Community Health Study (HCHS)
phs003457.v1.p1.c1
NSRR-HCHS
Yes
Yes
Hispanic Community Health Study (HCHS)
phs003457.v1.p1.c2
NSRR-HCHS
Yes
Yes
NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica (CRA)
phs000988.v5.p1.c1
topmed-CRA_DS-ASTHMA-IRB-MDS-RD
No
Yes
NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)
phs000920.v5.p2.c2
topmed-GALAII_DS-LD-IRB-COL
No
Yes
NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy
phs001293.v3.p1.c1
topmed-HyperGEN_GRU-IRB
No
Yes
NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy
phs001293.v3.p1.c2
HyperGEN_DS-CVD-IRB-RD
No
Yes
NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)
phs001402.v3.p1.c1
Mayo_VTE_GRU
No
Yes
NHLBI TOPMed - NHGRI CCDG: Massachusetts General Hospital (MGH) Atrial Fibrillation Study
phs001062.v5.p2.c2
MGH_AF_DS-AF-IRB-RD
No
Yes
NHLBI TOPMed - NHGRI CCDG: Massachusetts General Hospital (MGH) Atrial Fibrillation Study
phs001062.v5.p2.c1
MGH_AF_HMB-IRB
No
Yes
NHLBI TOPMed: African American Sarcoidosis Genetics Resource
phs001207.v3.p1.c1
Sarcoidosis_DS-SAR-IRB
No
Yes
NHLBI TOPMed: Women's Health Initiative (WHI)
phs001237.v3.p1.c1
WHI_HMB-IRB
No
Yes
NHLBI TOPMed: Women's Health Initiative (WHI)
phs001237.v3.p1.c2
WHI_HMB-IRB-NPU
No
Yes
NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)
phs001211.v4.p2.c2
ARIC_DS-CVD-IRB
No
Yes
NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)
phs001211.v4.p2.c1
ARIC_HMB-IRB
No
Yes
NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish
phs000956.v5.p1.c2
Amish_HMB-IRB-MDS
No
Yes
NHLBI TOPMed: Australian Familial Atrial Fibrillation Study
phs001435.v2.p1.c1
AustralianFamilialAF_HMB-NPU-MDS
No
Yes
NHLBI TOPMed - NHGRI CCDG: Early-onset Atrial Fibrillation in the CATHeterization GENetics (CATHGEN) Cohort
phs001600.v3.p2.c1
CATHGEN_DS-CVD-IRB
No
Yes
NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)
phs001472.v2.p1.c1
ECLIPSE_DS-COPD-MDS-RD
No
Yes
NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy (GENOA)
phs001345.v3.p1.c1
GENOA_DS-ASC-RF-NPU
No
Yes
NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity (GenSalt)
phs001217.v3.p1.c1
GenSalt_DS-HCR-IRB
No
Yes
NHLBI TOPMed: GOLDN Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate
phs001359.v3.p1.c1
GOLDN_DS-CVD-IRB
No
Yes
NHLBI TOPMed: Defining the time-dependent genetic and transcriptomic responses to cardiac injury among patients with arrhythmias
phs001434.v2.p1.c1
miRhythm_GRU
No
Yes
NHLBI TOPMed: Partners HealthCare Biobank
phs001024.v5.p1.c1
PARTNERS_HMB
No
Yes
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c3
pharmHU_DS-SCD
No
Yes
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c2
pharmHU_DS-SCD-RD
No
Yes
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c1
pharmHU_HMB
No
Yes
NHLBI TOPMed: REDS-III Brazil Sickle Cell Disease Cohort (REDS-BSCDC)
phs001468.v3.p1.c1
REDS-III_Brazil_SCD_GRU-IRB-PUB-NPU
No
Yes
NHLBI TOPMed: San Antonio Family Heart Study (SAFHS)
phs001215.v4.p2.c1
SAFHS_DS-DHD-IRB-PUB-MDS-RD
No
Yes
NHLBI TOPMed: Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE)
phs001467.v2.p1.c1
SAPPHIRE_asthma_DS-ASTHMA-IRB-COL
No
Yes
NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women
phs001040.v5.p1.c1
WGHS_HMB
No
Yes
The 2023-10-04 release marks the 15th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., the ability to view cohort variables prior to access, and the ability to export selected data into an analysis workspace). Please find more detail on the new features in the section below.
The 2023-10-04 data releases include the addition of TOPMed studies spanning early-onset COPD, heart studies from various geographies, diabetes heart studies, and more. CRAMs and unharmonized clinical files were updated for six TOPMed studies already in BDC. BioLINCC Multi-Ethnic Study of Atherosclerosis studies were also added. Please refer to the Data Releases section below for more information as well as the Data page on the BDC website.
BDC Powered by PIC-SURE (BDC-PIC-SURE): Open Access Variable Distributions Tool: Researchers can now view the variable distributions for their selected cohort with BDC-PIC-SURE Open Access to further their data discovery and exploration prior to access. Once variable filters have been applied, the Variable Distributions Tool displays bar charts for categorical variables and histograms for continuous variables. Note that the visualizations are obfuscated to protect participant-level data.
BDC Powered by Seven Bridges (BDC-Seven Bridges): Data Export from the BDC-PIC-SURE UI Public Project: This public project enables users to use a CWL tool to export selected data from BDC-PIC-SURE into a BDC-Seven Bridges project using a query from the BDC-PIC-SURE UI and the BDC-PIC-SURE API. This project is a continuation of our original BDC-PIC-SURE API Public Project. Combined, these public projects give savvy and novice users the ability to transfer and make cohorts on BDC-PIC-SURE and bring data frames over to BDC-Seven Bridges for analysis.
BDC Powered by Terra (BDC-Terra) workspace data security: When users import data from NIH data repositories such as BDC, they are only allowed to import into existing BDC-Terra workspaces that have an authorization domain and/or protected data setting. Import of these datasets into unprotected workspaces will not succeed. This ensures that the data access is appropriately logged by BDC-Terra.
The table below highlights which studies were included in the 2023-10-04 data release. This release includes a significant representation from the NHLBI TOPMed program with studies spanning areas such as early-onset COPD, heart studies from various geographies, diabetes heart studies, and more. Notably, CRAMs and unharmonized clinical files have been updated for 6 TOPMed studies that were already a part of BDC. Additionally, new studies pertaining to the BioLINCC Multi-Ethnic Study of Atherosclerosis have been introduced. The data is now available for access across the entire ecosystem.
BDC-Gen3 release notes BDC-Terra release notes BDC-Seven Bridges release notes BDC-PIC-SURE release notes
The 2024-04-01 release marks the 17th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., SRA import via DRS and the ability to save dataset IDs). Please find more details on the new features below.
The 2024-04-01 data releases include the addition of research on heart failure and COVID-19 plus version updates to ongoing genetic and genomic studies including COPD and atrial fibrillation. Please refer to the Data Releases section below for more information as well as the Data page on the BDC website.
BDC Powered by Seven Bridges (BDC-Seven Bridges) SRA Import via DRS: The Sequence Read Archive (SRA) has been accessible via the SRA Toolkit, which involves users downloading a copy to their local environment and then downloading the SRA data to their project on BDC-Seven Bridges. NCBI is now storing the SRA data in cloud buckets on Amazon and Google, allowing users to avoid egress charges and simplifying access to the data via BDC-Seven Bridges’ new SRA to DRS Converter workflow.
BDC Powered by PIC-SURE Save Dataset ID: Users can now save the dataset ID after applying filters and building a cohort, allowing them to view and access their saved cohorts at a later time. Saved dataset IDs can be viewed and managed on the Authorized PIC-SURE Dataset Management page.
The table below highlights which studies were included in the 2024-04-01 data release.
The latest release incorporates studies from the Heart Failure Network (HFN), National Sleep Research Resource (NSRR), Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection (RECOVER Adult), and the Collaborative Cohort of Cohorts for COVID-19 Research (C4R). Additionally, the release broadens its scope with version updates to ongoing genetic and genomic studies, including the NHLBI TOPMed projects such as the evaluation of COPD longitudinally, and the genetic epidemiology of conditions like atrial fibrillation within the CATHGEN cohort, among others.
The data will be available for access across the entire ecosystem by 2024-04-05.
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
BDC-Gen3 release notes BDC-Terra release notes BDC-Seven Bridges release notes BDC-PIC-SURE release notes
The 2024-10-21 release marks the 19th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., supporting seqr genomics analysis, and exporting selected cohort data in PFB format). Please find more detail on the new features in the sections below.
The 2024-10-21 data releases include the addition of studies on asthma and sickle cell disease, plus new imaging from cardiovascular and atherosclerosis studies. Updates are highlighted for COPD, atrial fibrillation, and childhood asthma studies, and new additions include liver disease, myocardial genomics, and exRNA studies. The release also introduces the RECOVER-Pediatric project and the REDS-IV-P Epidemiology of COVID-19 study. Please refer to the Data Releases section below for more information as well as the Data page on the BDC website.
BDC Powered by Terra (BDC-Terra) now supports seqr genomics analysis: seqr provides rich gene and variant-level annotations and powerful filtration tools to perform variant searches within a family or across projects. To get started, check out the video tutorials, including a video describing how to load your data in seqr.
Export selected cohort data in Portable Format for Biomedical Data (PFB): BDC Powered by PIC-SURE (BDC-PIC-SURE) now allows researchers to export selected participant-level data in PFB file format. When using the Select and Package Data tool in Authorized PIC-SURE, simply choose “Package Data as PFB” to export in this file format.
The table below highlights which studies were included in the 2024-10-21 data release.
The latest release features NHLBI TOPMed projects such as the Severe Asthma Research Program (SARP) and Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU). Additionally, it includes new imaging XML schemas from the Cardiovascular Health Study (CHS) and the Multi-Ethnic Study of Atherosclerosis. Updates are also highlighted in the Boston Early-Onset COPD Study, Cleveland Clinic Atrial Fibrillation Study, and the Childhood Asthma Management Program (CAMP). New additions include the Human Liver Cohort and studies on myocardial genomics and exRNA profiles. The release also introduces the RECOVER-Pediatric project and the REDS-IV-P Epidemiology of COVID-19 study.
The data is now available for access across the entire ecosystem.
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
BDC Powered by Gen3 release notes BDC Powered by Terra release notes BDC Powered by Seven Bridges release notes BDC Powered by PIC-SURE release notes
The 2024-07-02 release marks the 18th release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., an expanded workflow cost estimator, cascading authorization from parent to child studies, and DOIs at the dataset level). Please find more detail on the new features and user support materials in the sections below.
The 2024-07-02 data releases include the addition of research on atrial fibrillation, asthma, sickle cell disease, atherosclerosis, and more. Please refer to the Data Releases section below for more information as well as the Data page on the BDC website.
Fixed Interoperability on BioData Catalyst Powered By Seven Bridges (BDC-Seven Bridges): BDC-Seven Bridges completed work on updating interoperability functionality. The initial release of the project-based data download restriction functionality inadvertently interfered with DRS data interoperability between BDC-Seven Bridges and other ecosystems such as CAVATICA. This unintentionally re-siloed data on those systems and runs counter to the overarching NIH data ecosystem goals of making data available to users across NIH institute/system boundaries.
Workflow Cost Estimator Expansion: A feature that enables users to estimate analysis costs before running has been expanded to three new workflows on BDC-Seven Bridges: 1) Cyrius, a tool to genotype CYP2D6 from WGS BAM or CRAM files, 2) kallisto quant, a tool to quantify RNA-seq data, and 3) BEDTools Coverage, a tool that computes both the depth and breadth of coverage of features in file B on the features in file A, useful for comparing WGS files. Users can filter tools based on the interactive cost estimator. See here for documentation.
Support Cascading authorization from dbGaP parent to child studies: Gen3 has updated the authorization process in BDC to enable a researcher with access to a dbGaP parent study to automatically gain access to relevant child studies. The authorization process as it existed previously in BDC expected dbGaP to explicitly grant access to both parent and its associated substudies individually. Since dbGaP did not provide explicit access for child studies, users were not able to access these child studies without additional authorization requested manually. With the implementation of support for cascading of authorization from parent to child study, a researcher with access to a dbGaP parent study will also gain access to relevant child studies in BDC, eliminating the need for any manual authorization process.
Implementation of DOIs at Dataset level: A digital object identifier (DOI) is a persistent identifier or handle used to identify objects uniquely, standardized by the International Organization for Standardization (ISO). In BDC, DOIs have been created and made available at the dataset level to assign a persistent identifier in a standard format. The DOIs are available via the Gen3 discovery page as well as the API. DataCite was used as the registration service. Going forward, every BDC dataset will have a DOI minted as part of the data ingestion process. For a user, having assigned DOIs to datasets will promote research reproducibility and data FAIR-ness.
View Stigmatizing Variables in PIC-SURE Open Access: Researchers can now view all variables, including stigmatizing variables, that are relevant to their search. Though these variables are not filterable in Open Access to protect participant data, this allows researchers to better understand what information is present in BDC. For more information about stigmatizing variables, please visit the publicly available GitHub repository.
The table below highlights which studies were included in the 2024-07-02 data release.
The latest release includes studies from NHLBI TOPMed projects such as Partners HealthCare Biobank, Novel Risk Factors for the Development of Atrial Fibrillation in Women, and the Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE). New versions of studies like Walk-PHaSST Sickle Cell Disease, the Malmo Preventive Project, and the Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study are also featured. Additionally, the release includes updates to studies like Outcome Modifying Genes in Sickle Cell Disease (OMG) and the Vanderbilt University BioVU Atrial Fibrillation Genetics Study. The Collaborative Cohort of Cohorts for COVID-19 Research (C4R) and NIH RECOVER projects are also part of this release, including studies from the Hispanic Community Health Study/Study of Latinos and the Multi-Ethnic Study of Atherosclerosis.
The data is now available for access across the entire ecosystem.
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
BDC-Gen3 release notes BDC-Terra release notes BDC-Seven Bridges release notes BDC-PIC-SURE release notes
The 2023-07-11 release marks the fourteenth release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features, e.g., Faceted Search in BDC Powered by Seven Bridges (BDC-Seven Bridges), along with documentation to help new users get started on the ecosystem, e.g., updated WDL documentation in BDC Powered by Terra (BDC-Terra). This release also includes enhanced support for discovering what datasets are available via BDC Powered by Gen3. Please find more detail on the new features and user support materials in the sections below.
The 2023-07-11 data releases include the addition of various research projects related to COVID-19, lung development, platelet transfusion refractoriness, sickle cell anemia, asthma, pregnancy outcomes, and family health studies. Please refer to the Data Releases section below for information on upcoming data releases. A list of currently available data can be viewed on the Data page of the BDC website.
Faceted Search in BDC-Seven Bridges: Version 1 of Faceted Search has been deployed for all users on BDC-Seven Bridges. This feature enables users to query or filter any BDC ingested data in a faceted way to find files and form groups of files by searching characteristics such as authorization status, study accession number, type of data, etc. With the release of v1 Faceted Search, users can now more easily find data that is relevant to their research. Faceted Search is currently available for 10 datasets and will be expanded to all hosted datasets in the following quarter. The Faceted Search feature can be found under the Data drop-down menu.
BDC-Gen3 Metadata Being Updated to bring data from dbGaP FHIR database: BDC-Gen3’s Discovery Page (and underlying BDC-Gen3 Source of Truth Metadata API) allows unauthenticated users to discover what datasets are available in BDC. Fast Health Interoperability Resources (FHIR) is an Health Level Seven International (HL7) specification for Healthcare Interoperability. Last quarter, BDC-Gen3 worked to consume the new metadata from the dbGaP FHIR Server (as part of the officially defined data ingestion process). This quarter, BDC-Gen3’s Data Ingestion Pipeline has been updated to load FHIR metadata every new data release. The loaded metadata is available to all clients/users through BDC-Gen3’s Metadata API, and loaded metadata is viewable in BDC-Gen3’s Discovery Page.
New and Improved Genomic Filtering on BDC Powered by PIC-SURE (BDC-PIC-SURE): The Genomic Filtering modal on BDC-PIC-SURE has been updated to more accurately represent the relatedness between the various filtering fields. This includes the revamped “Variant consequence calculated” field, which includes different levels of severity and their associated consequences. Additionally, the “Selected Genomic Filters” section now more explicitly summarizes the filter criteria being applied.
Edit Queries Built in BDC-PIC-SURE Using the API: Researchers that created a cohort on BDC-PIC-SURE’s user interface can now edit that query’s parameters using Python or R code via the BDC-PIC-SURE API. This provides more flexibility for researchers wanting to refine or change their cohort after export and eliminates the need to return to the user interface.
Updated WDL documentation in BDC-Terra: Based on user feedback, Terra documentation has been expanded and updated to include: A new wdl-docs GitHub repository with a section dedicated to resources created by the WDL community, a new wdl-docs website to host the documentation from the new wdl-docs GitHub repository, updates to all existing WDL syntax documentation to match the WDL 1.0 spec, 17 new articles, 11 cookbook-style documents to teach users about specific use cases and provide example workflows, and 6 best practices documents to help users understand some of the grayer areas of coding in WDL. The documents are now available on the new wdl-docs GitHub repository.
New Code in “0_Export_from_UI” BDC-PIC-SURE API Examples: The example code has been updated to include new coding examples on how to use the BDC-PIC-SURE API to edit query parameters of a cohort built in the BDC-PIC-SURE user interface. These examples are available in both Python and R in both Jupyter and RStudio.
The table below highlights which studies were included in the 2023-07-11 data release. The Q2 data release included various research projects related to COVID-19, lung development, platelet transfusion refractoriness, sickle cell anemia, asthma, pregnancy outcomes, and family health studies. These include two studies from the COVID-19 Therapeutic Interventions and Vaccines initiative (ACTIV4a and ACTIV4c). There is a study on lung development (LungMAP) and another tackling platelet transfusion refractoriness in patients with severe thrombocytopenia using Eculizumab (DIR-Eculizumab). Other studies revolve around the use of hydroxyurea in children with sickle cell anemia (BABYHUG), the genetic epidemiology of asthma in Costa Rica (CRA), nulliparous pregnancy outcomes (nuMoM2b), multicenter study of hydroxyurea (MSH), and the Cleveland Family Study (CFS). The data is now available for access across the entire ecosystem.
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
BDC-Gen3 release notes BDC-Terra release notes BDC-Seven Bridges release notes BDC-PIC-SURE release notes BDC-Dockstore release notes
The 2022-04-04 release marks the ninth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., machine learning tools for chest CT imaging) along with documentation and tutorials (e.g., a new guide to sharing content) to help new users get started on the system. This release also includes enhanced support for synchronizing tools and workflows between Dockstore and GitHub. Please find more detail on the new features and user support materials in the sections below.
The 2022-04-04 data release includes the addition of COVID-19 datasets ACTIV4a and ACTIV4b. Please refer to the Data Release section below for more information as well as the page on the BioData Catalyst website.
Machine learning tools for chest CT imaging: Seven Bridges and Harvard Medical School have collaborated to release a Public Project of machine learning tools titled: Automated Chest Imaging Platform (CIP) CT Phenotyping and Machine Learning Discovery in COPD. The Public Project includes a detailed guide for other researchers to use the tools and notebooks on COPD datasets or modify the tools for their own lung CT data.
Storage optimized instances on Seven Bridges: Users can now access i3 and i3en AWS instances for Interactive Analysis (R Studio, JupyterLabs, SAS Studio) on Seven Bridges. These storage optimized instances provide access to between 5 TB and 60 TB of storage for interactive environments which enables researchers to harmonize larger datasets.
New CWL tools and workflows on Seven Bridges:
short variant discovery 4.2.0.0
toolkit
0.2.4
toolkit
tools 0.8.1
The table below highlights which studies were included in the Q1 2022 data releases. COVID-19 datasets ACTIV4a and ACTIV4b were released to production. Most of the work for ingestion of COVID19-C3PO dataset has been done and will be released in early April. TOPMed Freeze 9 datasets were ingested as the data became available. Twenty datasets were ingested and will be released as part of the fourth batch in early April as well. The data is now available for access across the entire ecosystem.
PIC-SURE release notes
Gen3 release notes
The 2023-01-09 release marks the twelfth release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features (e.g., Azure volumes now available on both main analysis platforms) along with documentation and tutorials (e.g., information on how variable tags are generated) to help new users get started on the system. This release also includes enhanced support for moving data seamlessly across platforms. Please find more detail on the new features and user support materials in the sections below.
The 2023-01-09 data releases include the addition of the Pediatric Cardiac Genomics Consortium (PCGC). Please refer to the Data Releases section below for more information as well as the page on the BDC website.
Azure volumes are now available on BDC Powered by Seven Bridges: Users can now link a Microsoft Azure bucket to their Seven Bridges workspaces. After logging in, go to Data > Volumes and select “Microsoft Azure” to be led through a bucket-linking wizard.
DRS Manifest Export: In order to further improve interoperability and allow users to move their data in a seamless way across platforms, the DRS export option on the Seven Bridges’ platforms is now available. With the new functionality, users can generate links to platform files (DRS URIs) and metadata into a manifest file, which can then be used for importing the files and metadata on other platforms.
OmicsCircos R Shiny app now available on BDC-Seven Bridges: OmicCircos app is a R Shiny application created around the OmicCircos R package for more effective generation of high-quality circular plots for visualizing genomic data. Common use cases include mutation patterns, copy number variations (CNVs), expression patterns, and methylation patterns. Such variations can be displayed as scatterplot, line, or text-label figures.
Introduction to SAS Public Project on BDC-Seven Bridges: Seven Bridges released a Public Project to train users on how to use SAS. The public project contains three notebooks that walk a user through: 1) loading and cleaning data in SAS using ICD9 codes, 2) pulling the CDC’s Social Vulnerability Index data via API and running a regression, and 3) loading hosted 1000 Genomes data into SAS and visualizing mutation information. A user can copy the public project to their own workspace and modify the tutorial notebooks to suit their needs.
New CWL Tools/Workflows on BDC-Seven Bridges:
BEDTools 2.30.0 toolkit:
BEDTools Coverage - returns the depth and breadth of coverage of features from B on the intervals in A
BEDTools Genomecov - computes histograms of feature coverage for a given genome
BEDTools GetFasta - extracts sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file
BEDTools Intersect - screens for overlaps between two sets of genomic features
BEDTools Merge - combines overlapping or “book-ended” features in an interval file into a single feature
BEDTools Sort - sorts a feature file by chromosome and other criteria
FlowSOM 2.4.0 which presents an algorithm used to distinguish cell populations from both flow and mass cytometry data in an unsupervised way.
cytofkit2 0.99.80 which is designed to analyze mass cytometry data from FCS files. It includes preprocessing, cell subset detection, cell subset visualization and interpretation, and inference of subset progression.
flowAI 1.24.0 which performs quality control on FCS data acquired using flow cytometry instruments. By evaluating three different properties: flow rate, signal acquisition, dynamic range, and quality control, it enables the detection and removal of anomalies.
CNVkit 0.9.9 toolkit for inferring and visualizing copy number from high-throughput DNA sequencing data.
SBG Single-Cell RNA Deep Learning - Training is a single cell classifier pipeline for human data. It relies on the transfer learning approach, which uses pre-trained gene embeddings as the starting point for building a model adjusted to given single-cell datasets.
SBG Single-Cell RNA Deep Learning - Predict is a single-cell classifier pipeline for human data. This tool uses the deep learning model generated by the SBG Single-Cell RNA Deep Learning - Training workflow to classify the input dataset.
Azure is now available on BDC Powered by Terra: Users can now log into Terra with a Microsoft Azure Cloud account. This is an invite-only version of Terra on the Azure platform. The public offering of Terra on Azure is expected in early 2023.
A new spend report is now available for BDC-Terra billing projects: The report identifies which workspaces are costing the most, to provide more transparency around cloud costs incurred in Terra. To access the spend report, go to your billing project (main menu > billing > billing project) and click on the "Spend report" tab.
New streamlined user journey from BDC Powered by PIC-SURE to analysis platforms: PIC-SURE has added “Export to Seven Bridges” and “Export to Terra” buttons to streamline data export into a BioData Catalyst analysis workspace. After exploring and filtering variables in PIC-SURE Authorized Access, users can package their data with the Select and Package Data Tool. Once the data is packaged, users can select their preferred BDC analysis platform with the new Export buttons. This provides all information needed and points the user directly to the public PIC-SURE project on either Seven Bridges or Terra.
Take a Tour of BDC-PIC-SURE: PIC-SURE has updated the guided tour of the interface to interactively display search results based on the user’s authorization. This guided tour walks through the different parts of the platform, including how to use tags, where search results are displayed, and how to interpret the Results Panel.
BABYHUG Data Field Issue: The study BABYHUG, phs002415, contained a data file that included SAS-derived new line characters in data fields. As provided by the data submitter this caused shifts in the data rows, leading to fields being incorrectly mapped to the wrong variable. A new corrected version of the file has been requested from the data submitter.
BDC GitBook on BDC-PIC-SURE: Users can now access the BDC GitBook documentation directly from the PIC-SURE platform under the “Help” tab.
The table below highlights which studies were included in the 2023-01-09 data release.
The PCGC substudy contains whole exome sequences, targeted sequences, and SNP array data. It is a multi-center, observational cohort study of individuals with congenital heart defects. The study aims to investigate the relationship between genetic factors and phenotypic and clinical outcomes in patients with CHD. Summary level phenotypes for the study participants can be viewed on the top-level study page. Individual level data and molecular data for the study are available by requesting Authorized Access. The study has collected phenotypic data and source DNA from 10,000 probands, parents, and families of interest. The data is now available for access across the entire ecosystem.
The 2022-10-03 release marks the eleventh release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., PIC-SURE's new search interface) along with updated documentation. This release also includes updated versions of the Study Variable Explorer and the Annotation Explorer. Please find more detail on the new features and user support materials in the sections below.
The 2022-10-03 data releases include the addition of TOPMed Boston-Brazil SCD and PCGC datasets. Please refer to the Data Releases section below for more information as well as the on the BioData Catalyst website.
Now export with Study Variable Explorer on BioData Catalyst Powered by Seven Bridges: The on BioData Catalyst Powered by Seven Bridges allows researchers to explore phenotypic variables from the TOPMed data dictionaries in an open access manner. Seven Bridges released Study Variable Explorer version 2 which expands on version 1 by adding tag search, notes, and data export. The latest update enables researchers to track their variable selection process through notes tied to study and variable information which can be shared with collaborators through .json export. This gives analysts tractable information for reproducing decision-making during the harmonization process.
New Interactive Web Apps Gallery: Under the “Public Gallery” dropdown on BioData Catalyst Powered by Seven Bridges, a new display for “Interactive Web Apps” provides access to the LocusZoom and Model Explorer R Shiny applications.
Annotation Explorer Version 2: The Annotation Explorer enables users to interactively explore, query, and study characteristics of an inventory of annotations for the variants across the genome. This application can be used pre-association testing to interactively explore variant aggregation, filtering strategies, and generate input files for multiple-variant association testing, or post-association testing to explore annotations associated with a set of significant variants or variants of interest. Seven Bridges previously released the Annotation Explorer R Shiny application through a Public Project. Now, Annotation Explorer is integrated with BioData Catalyst Powered by Seven Bridges through the “Data” dropdown. The new integration enables querying genome wide annotations and variants (including the TOPMed Freeze5 and Freeze8 datasets) in a more user-friendly interface without running an R Studio notebook. This release is integrated into the billing system so a user can select their compute needs based on price and monitor Annotation Explorer-specific costs through their billing group.
New CWL Tools and Workflows on BioData Catalyst Powered by Seven Bridges:
GATK VariantEval BETA 4.2.5.0 tool which is used for evaluating variant calls.
GATK FilterMutectCalls 4.2.5.0 tool which is used to filter somatic SNVs and indels called by Mutect2.
Picard CreateSequenceDictionary 2.25.7 tool for creating a DICT index file for a sequence.
WARP ExomeGermlineSingleSample 2.4.4 pipeline for data pre-processing and variant calling in human WES data.
BCFtools 1.15.1 toolkit - CWL1.2
Kraken2 2.1.2 toolkit
SRA (v3.0.0, CWL1.2)
SRA sam-dump that converts SRA data into SAM format. With aligned data, NCBI uses Compression by Reference, which only stores the differences in base pairs between sequence data and the segment it aligns to. The process to restore original data, for example as FASTQ, requires fast access to the reference sequences that the original data was aligned to.
SRA fasterq-dump tool that converts SRA data into FASTQ format while using temporary files and multi-threading to speed up the extraction.
SRA fastq-dump tool that converts SRA data into FASTQ format.
Salmon (v1.5.2, CWL1.2)
Salmon Alevin tool that introduces a family of algorithms for quantification and analysis of 3’ tagged-end single-cell sequencing data.
Salmon Index tool that builds an index necessary for the Salmon Quant and Salmon Alevin tools. To create an index, it uses a transcriptome reference file in FASTA format. Additionally, one can provide a genome reference along with transcriptome to create a hybrid index compatible with the improved mapping algorithm named Selective Alignment.
Updated Interactive Analysis interface on Terra: Under the new design, the “Notebooks” tab is transformed into the more general “Analyses” tab, from where you can access the multiple applications available for Interactive Analysis in Terra. Accordingly, the list of Notebook files (.ipynb) becomes the list of “Your Analyses”, which now supports including R Markdown files (.Rmd). Just like Notebook files, any R Markdown files created in or added to the Analyses tab will be automatically stored in the workspace bucket and synced between the bucket and your persistent disk.
PIC-SURE's new search interface: PIC-SURE has released an improved dynamic data exploration experience, allowing users to easily search and query at the variable value and genomic variant level. The streamlined search experience enables users to search variables and view associated information, such as decoded variable level information, details about the dataset, and study information - all without opening any data files. Updates to the interface include filtering search results by variable and study tags, a new genomic filtering model, adding variables to export without filtering, a simpler select and package data process, and visualizing single variable distributions.
Dedicated PIC-SURE images within Seven Bridges analysis workspaces: The Seven Bridges and PIC-SURE teams have collaborated to provide users with dedicated workspace images that contain all the pre-installed packages necessary to run the PIC-SURE example notebooks. PIC-SURE API users in Seven Bridges will not have to worry about changes to package dependencies and/or versions, and R users in particular will notice a significantly faster start-up time during environment set-up. The PIC-SURE images are available in both the JupyterLab and RStudio Seven Bridges environments. Users can find this feature by specifying the Environment setup of any Data Cruncher analysis.
The table below highlights which studies were included in the Q3 2022 data releases. The data is now available for access across the entire ecosystem.
The 2022-07-11 release marks the tenth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., importing files from AnVIL via DRS and creating multi-sample VCFs). This release also includes enhanced support for CWL tools on GitHub. Please find more detail on the new features in the sections below.
The 2022-07-11 data release includes the addition of COVID-19 dataset C3PO and TOPMed Freeze 9 batch 3 and 4. Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.
Import files from AnVIL to BioData Catalyst Powered by Seven Bridges via DRS
Seven Bridges released an interoperability feature enabling . A TOPMed researcher working in BioData Catalyst who identifies a causal variant through association testing might want to next investigate how that variant affects gene expression. The AnVIL ecosystem hosts the Genotype-Tissue Expression (GTEx) datasets which can be used to understand which tissues are affected by novel variants. Seven Bridges’ latest release allows a TOPMed researcher to go to AnVIL and push data they have permissions for to BioData Catalyst Powered by Seven Bridges, thus allowing the researcher to run the variant association test on TOPMed data and identify how that variant changes tissue expression with GTEx data in one workspace.
Create multi-sample VCFs with the Variant Store
Researchers who have access to many TOPMed studies will want to mix and combine VCF files into a multi-sample VCF. Additionally, researchers might want to subset samples based on genomic regions. Using standard bioinformatics tools, this process involves many manual steps and can be time intensive and cost prohibitive. The on BioData Catalyst Powered by Seven Bridges uses a series of API calls to combine VCFs from studies of interest and subset the multi-sample VCF based on the selected genomic region. The latest release allows researchers to track the costs associated with generating multi-sample VCFs via the Variant Store as a dedicated line item in their billing group separate from analysis and storage costs.
Explore, tag, and annotate phenotypes in the Study Variable Explorer
The on BioData Catalyst Powered by Seven Bridges allows researchers to explore phenotypic variables from the TOPMed data dictionaries in an open access manner. Previously, researchers were limited to searching data dictionary information on dbGaP and making comparisons between different study variables was cumbersome with poor UX. Study Variable Explorer enables researchers to select phenotypic variables from across TOPMed studies and view detailed information and distributions of the variable data. By searching keywords, such as obesity, a researcher can compare like variables within and across hosted datasets including the number of subjects and descriptions of the variables. Additionally, users can create custom searchable tags and notes for each variable to track their variable selection and pre-harmonization process.
New CWL Tools and Workflows on BioData Catalyst Powered by Seven Bridges
An updated version of the SRA Download and Set Metadata workflow (SRA Toolkit 3.0.0) that downloads metadata associated with SRA accession via SRA Run Info CGI, (on-demand instance) FASTQ files and sets corresponding metadata.
fastENLOC (v1.0, CWL1.2), a tool that enables integrative genetic association analysis of molecular QTL data and GWAS data. It performs integration of the results from molecular quantitative trait loci (QTL) mapping into genome-wide genetic association analysis of complex traits, with the primary objective of quantitatively assessing the enrichment of the molecular QTLs in complex trait-associated genetic variants and the colocalizations of the two types of association signals.
GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0, a workflow used for somatic short variant calling. It runs on a single tumor-normal pair or on a single tumor sample, and performs additional filtering and functional annotation tasks, and
GATK Create Mutect2 Panel of Normals 4.2.5.0 that creates a panel of normals for use in other GATK workflows. The workflow takes multiple normal sample callsets and passes them to GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0 with tumor-only mode (although it is called tumor-only, normal samples are given as the input) and additionally collates sites present in two or more samples into a sites-only VCF.
Three apps from the MetaXcan toolkit:
S-PrediXcan for computing associations between omic features and a complex trait starting from GWAS summary statistics.
S-MultiXcan for computing association from predicted gene expression to a trait, using multiple studies for each gene.
MetaMany for serially performing multiple MetaXcan runs on a GWAS study from summary statistics using multiple tissues.
The MetaXcan Workflow for computing associations between omic features and complex traits across multiple tissues. The workflow includes two tools from the MetaXcan framework - MetaMany and S-MultiXcan and it uses summary statistics from a GWAS study and multiple models that predict the expression or splicing quantification.
MaxQuant (v2.0.3.0, CWL1.2), a quantitative proteomics tool designed for analyzing large mass-spectrometric data. It uses a target-decoy search strategy to estimate and control the extent of false positives. Within the target-decoy strategy, MaxQuant applies the concept of posterior error probability (PEP) to integrate multiple peptide properties (e.g., length, charge, number of modifications) together with Andromeda score into a single quantity, reflecting the quality of a peptide spectrum match (PSM).
Dockstore GitHub app support expanded to CWL tools
The table below highlights which studies were included in the Q2 2022 data releases. The data is now available for access across the entire ecosystem.
The 2022-01-24 release marks the eighth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., the LocusZoom interactive app) along with documentation and tutorials (e.g., a guide for consortia using Seven Bridges) to help new users get started on the system. Please find more detail on the new features and user support materials in the sections below.
The 2022-01-24 data release includes the addition of TOPMed Freeze 9 batch 1 & 2, CATHGEN and PETAL RED CORAL datasets. Please refer to the Data Release section below for more information as well as the on the BioData Catalyst website.
LocusZoom Interactive App on Seven Bridges: LocusZoom, part of the GENESIS pipeline, enables users to interactively visualize and explore results of single variant association tests. The tool also provides a User Guide on the front page that walks users through inputs, outputs, and functionality of the app with the ability to practice on open access data from the University of Michigan. To access the app, please email .
GENESIS Model Explorer Interactive App on Seven Bridges: The Model Explorer app was developed by our collaborators at the University of Washington and then handed off to the Seven Bridges team for hosting. Through the app, users can visualize and explore the results of the GENESIS Null Model workflow including phenotype variables, genotypes, and GENESIS model results without prior R programming knowledge. To access the app, please email .
Phenome-Wide analysis examples on BioData Catalyst studies using PIC-SURE: New example are available on Terra (Python and R) and Seven Bridges (RStudio and Python) illustrating how to query data using the PIC-SURE API. It takes a simple PheWAS analysis as a use case. This PheWAS example analysis focuses on the TOPMed DCC Harmonized Variables. The harmonized variables are leveraged to provide an example PheWAS focused on total cholesterol in two studies: ARIC and FHS. This example shows how the PIC-SURE API is helpful in wrangling phenotypic data.
Guide for Consortia using Seven Bridges: . This guide describes how consortia can use platform projects to selectively share, harmonize, and distribute data. This guide was inspired by conversations with the C4R consortia which revealed the type of guidance and information data that coordinating centers and consortia members need in order to get set up on BioData Catalyst as quickly as possible. In the outlined example, multiple study centers can bring their data to BioData Catalyst and the Data Coordinating Center (DCC) can then link that data to a centralized project to perform harmonization. The DCC can then distribute select harmonized datasets to analysis working groups that applied for permission to study the harmonized data. Future consortia can use the architecture illustrated in this guide to quickly onboard and begin coordination.
Guide for Workshops and Courses on Seven Bridges: . This guide was developed after Seven Bridges worked with the University of Washington Summer Institute in Statistical Genetics and the American Thoracic Society to develop a summer workshop and a course, respectively. The guide describes the UW Summer Institute and ATS course case studies and step-by-step considerations including a timetable for future educators that could use BioData Catalyst for their classrooms.
The table below highlights which studies were included in the data releases done in Q4 2021. TOPMed Freeze 9 datasets were ingested as the data became available. 37 datasets were ingested and released in 2 batches. TOPMed CATHGEN study was released. Of the COVID 19 datasets, PETAL RED CORAL data was released on December 1st after receiving the official publication date. The data is now available for access across the entire ecosystem.
Gen3 release notes
PIC-SURE release notes
The 2023-04-04 release marks the thirteenth release for the NHLBI BioData Catalyst® (BDC) ecosystem. This release includes several new features, e.g., a new gallery for Public Projects and new project-based download restrictions on BDC Powered by Seven Bridges (BDC-Seven Bridges). It also includes documentation and tutorials to help new users get started on the system, e.g., how to start using the BDC Powered by PIC-SURE (BDC-PIC-SURE) API. Please find more details on the new features and user support materials in the sections below.
Please refer to the Data Releases section below for information on upcoming data releases. A list of currently available data can be viewed on the of the BDC website.
New gallery for Public Projects on BDC-Seven Bridges: BDC-Seven Bridges has released a new user interface to make browsing and selecting public projects easier. Previously, Public Projects were found as a list under a dropdown menu. The interface has been updated where the Public Resources > Projects dropdown displays a gallery of project cards with summaries and easily clickable “Copy Project” buttons.
Project-based download restrictions on BDC-Seven Bridges: Many consortia have found value in using the BDC-Seven Bridges project member permissions to collaborate and distribute data prior to public release. However, the ability to add new files to a project also allows a user to download files to their local environment. BDC-Seven Bridges released a new feature providing project-based download restrictions to the owner of the project. When creating a project, a user can turn on Download Restrictions and select to either allow analysis (CWL tools/workflows or Data Studio) but no download to a local environment, or no analysis and no download to the local environment. To request access to the new feature, email .
New CWL tools and workflows on BDC-Seven Bridges:
Minimac 4 4.1.2: a tool for imputing genotypes.
GATK 4.4.0.0
GATK IndexFeatureFile for indexing of provided feature files.
GATK MergeVcfs for combining multiple variant files.
GATK VariantEval BETA for evaluating variant calls.
GATK FilterMutectCalls filter somatic SNVs and indels called by Mutect2.
HTSeq-count 2.0.2: HTSeq-count is a Python tool for counting how many reads map to each feature.
GraphicsMagick 1.3.38
GraphicsMagick compare compares two images using statistics and/or visual differencing. The tool compares two images and reports difference statistics according to specified metrics, and/or outputs an image with a visual representation of the differences.
GraphicsMagick composite composites (combines) images to create a new image.
GraphicsMagick conjure interprets and executes scripts in the Magick Scripting Language (MSL). The Magick scripting language (MSL) will primarily benefit those that want to accomplish custom image processing tasks but do not wish to program.
GraphicsMagick convert is used to convert an input image file using one image format to an output file with the same or different image format while applying an arbitrary number of image transformations.
GraphicsMagick montage creates a composite image by combining several separate images.
MHC-I Binding Prediction tool (MHC I 3.1.2 toolkit) - which is used for prediction of peptides that bind to MHC I molecules.
MHC-II Binding Prediction tool (MHC II 3.1.6 toolkit) - which is used for prediction of peptides that bind to MHC II molecules.
MHCflurry Predict tool (MHCflurry 2.0.4 toolkit) - which is used for peptide/MHC I binding affinity prediction.
MHCflurry Scan tool (MHCflurry 2.0.4 toolkit) - which is designed to scan protein sequences and predict MHC-I ligands.
AXEL-F: Antigen eXpression based Epitope Likelihood-Function tool (AXEL-F 1.0.0 toolkit) - which is used for MHC-I epitope prediction.
NetChop tool (NetChop 3.0 toolkit) - which is a predictor of proteasomal processing based upon a neural network.
NetCTL tool (NetCTL 3.0 toolkit) - which is a T cell epitopes predictor.
NetCTLpan tool (NetCTLpan 3.0 toolkit) - which is a T cell epitopes predictor.
Class I Immunogenicity tool (Class I Immunogenicity 3.0 toolkit) - which predicts the immunogenicity of a peptide MHC (pMHC) complex.
TCRMatch tool (TCRMatch 1.0.2 toolkit) - which predicts T-Cell receptor specificity based on sequence similarity to characterized receptors.
BCell tool (BCell 3.1 toolkit) - which predicts linear B cell epitopes based on the antigen characteristics.
ElliPro tool (ElliPro 1.0 toolkit) - which predicts antibody epitopes based upon solvent-accessibility and flexibility.
Population Coverage tool (Population Coverage 3.0 toolkit) - which calculates the fraction of individuals predicted to respond to a given set of epitopes.
Epitope Cluster Analysis tool (Epitope Cluster Analysis 1.0 toolkit) - which groups epitopes into clusters based on sequence identity.
Picard 3.0.0 toolkit:
Picard CollectMultipleMetrics collects BAM statistics by running multiple Picard modules at once.
Picard ValidateSamFile validates an alignments file against the SAM specification.
Picard SortSam sorts alignment files (BAM or SAM).
Picard RevertSam reverts a BAM/SAM file to a previous state.
Picard MarkDuplicates marks duplicate reads in alignment files.
Picard GenotypeConcordance calculates genotype concordance between two VCF files.
Picard GatherBamFiles merges BAM files after a scattered analysis.
Picard FixMateInformation verifies and fixes mate-pair information.
Picard FastqToSam converts FASTQ files to an unaligned SAM or BAM file.
Picard CrosscheckFingerprints checks a set of data files for sample identity.
Picard CreateSequenceDictionary creates a DICT index file for a sequence.
Picard CollectWgsMetricsWithNonZeroCoverage evaluates the coverage and performance of WGS experiments.
Picard CollectVariantCallingMetrics can be used to collect variant call statistics after variant calling.
Picard CollectSequencingArtifactMetrics collects metrics to quantify single-base sequencing artifacts.
Picard CollectHsMetrics collects hybrid-selection metrics for alignments in SAM or BAM format.
Picard CollectAlignmentSummaryMetrics produces a summary of alignment metrics from a SAM or BAM file.
Picard CheckFingerprint checks sample identity of provided data against known genotypes.
Picard BedToIntervalList converts a BED file to a Picard INTERVAL_LIST format.
Picard AddOrReplaceReadGroups assigns all reads to the specified read group.
MetaCyto workflow (1.16.0 in CWL 1.2): based on R package MetaCyto that performs meta-analysis of both flow cytometry and mass cytometry (CyTOF) data. It is able to jointly analyze cytometry data from different studies with diverse sets of markers.
New and improved R adapter for BDC-PIC-SURE API: The R adapter for the BDC-PIC-SURE API has been completely revamped to improve performance, address known bugs, and make the API easier to use for R coders. All example code, in both Jupyter and RStudio, has been updated to show these code improvements in practice. Note: The old version of the R API will be available for use until August 31st, 2023. It is recommended that you update your code with the new changes.
A FHIR client
Direct interaction with dbGaP’s FHIR API
Extract, Transform, Load (ETL) logic to parse the content from dbGaP’s FHIR and load into BDC-Gen3’s Metadata API
BDC-Gen3’s Data Ingestion Pipeline will be updated to use the above tool to load FHIR metadata every new data release. In April 2023, loaded metadata will be available to all clients/users through BDC-Gen3’s Metadata API, and loaded metadata will be viewable in BDC-Gen3’s Discovery Page.
Learn about and start using the BDC-PIC-SURE API on the new “API” page: The “API” page on the BDC-PIC-SURE website provides everything you need to get started with the BDC-PIC-SURE API. This includes the personalized access token, links to publicly available R and Python code on both BDC Powered by Seven Bridges and Powered by Terra, and links to additional documentation.
In Q1 2023, progress was made in establishing procedures, clarifying data submission, and reworking screening protocols for multiple datasets for use with upcoming dataset ingestion. This included collaborative efforts with NHLBI to support pre-ingestion quality assurance, as well as data support for screening and assisting data submitters in preparing their data for future ingestion into BDC. Key datasets that underwent these processes include nuMoM2b (phs002808.v1.p1.c1), BABY HUG (phs002415.v1.p1.c1), MSH (phs002348.v1.p1.c1), NSRR-CFS (phs002715.v1.p1.c1), and CRA (phs000988.v4.p1.c1).
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|---|---|---|---|
Study Name | phs I.D. # | Acroynm | New to BioData Catalyst | New study version |
---|---|---|---|---|
Share content through Public Projects: Seven Bridges has published in the knowledge center offering an alternative way to share new workflows, notebooks, and open access data with the BDCatalyst community. Public Projects provide a space for researchers to publish their analyses with open access sample data, detailed walkthroughs, and contact information for feedback and improvements. Both researchers developing new tools and researchers using preconfigured pipelines benefit from published Public Projects.
Dockstore synchronization with GitHub: Dockstore has simplified its tool and workflow registration process to automatically synchronize with GitHub. Dockstore released several for how you can set up your GitHub repo with another file (.dockstore.yml) needed to kick off this process. Check out this for an introduction, and visit the updated Getting Started tutorials for registering and on Dockstore to learn more.
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
BDC-PIC-SURE Tag Generation: PIC-SURE has updated help text in the user interface and documentation to address the frequently asked question, “How are variable tags generated?” Users can find this help text in the “Filter by Variable Tags” box on the PIC-SURE platform and in the .
Updated BDC-PIC-SURE documentation on the Export buttons: The and were updated to include information about the new Export buttons. These updates were also released in the .
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
Gen3 release notes PIC-SURE release notes
Cure Sickle Cell Metadata Catalog integration: PIC-SURE has updated the Data Access Table to integrate information about sickle cell disease (SCD) studies from the (MDC). The “Additional Information” column includes a link to that SCD study’s page on the MDC. The Data Access Table also includes other new information, such as study design and study focus.
New BioData Catalyst Powered by PIC-SURE search interface: The documentation associated with PIC-SURE has been updated to reflect the recent release of the new search interface. This includes the and the tutorial videos on the .
Updated documentation on new Terra Interface: The documentation associated with Terra has been updated to reflect the recent release of the new analysis interface. This includes the Terra and the tutorial videos on the .
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
Gen3 release notes PIC-SURE release notes
Researchers can now register your tool to automatically sync with GitHub. Using GitHub Apps, Dockstore can react to changes on GitHub as they are made, keeping Dockstore synced with GitHub automatically. Additional details are available .
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
Gen3 release notes PIC-SURE release notes
BDC Powered by Gen3 (BDC-Gen3) Metadata Being Updated to bring data from dbGaP FHIR database: BDC-Gen3’s Discovery Page (and underlying BDC-Gen3 Source of Truth Metadata API) allows unauthenticated users to discover what datasets are available in BDC. Fast Health Interoperability Resources (FHIR) is an Health Level Seven International (HL7) specification for Healthcare Interoperability. The database of Genotypes and Phenotypes (dbGaP) has recently exposed a . BDC-Gen3 has worked to consume the new metadata from the dbGaP FHIR Server (as part of the officially defined data ingestion process). BDC-Gen3’s Python-based Software Development Kit (SDK) and Command Line Interface (CLI) now has:
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
---|
Gen3 release notes
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
NHLBI TOPMed: Boston Early-Onset COPD Study (EOCOPD)
phs000946.v5.p1.c1
topmed-EOCOPD_DS-CS-RD
No
No
NHLBI TOPMed: The Cleveland Family Study (CFS)
phs000954.v4.p2.c1
topmed-CFS_DS-HLBS-IRB-NPU
No
No
NHLBI TOPMed: The Jackson Heart Study (JHS)
phs000964.v5.p1.c1
topmed-JHS_HMB-IRB-NPU
No
No
NHLBI TOPMed: The Jackson Heart Study (JHS)
phs000964.v5.p1.c2
topmed-JHS_DS-FDO-IRB-NPU
No
No
NHLBI TOPMed: The Jackson Heart Study (JHS)
phs000964.v5.p1.c3
topmed-JHS_HMB-IRB
No
No
NHLBI TOPMed: The Jackson Heart Study (JHS)
phs000964.v5.p1.c4
topmed-JHS_DS-FDO-IRB
No
Yes
NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study (FHS)
phs000974.v5.p3.c1
topmed-FHS_HMB-IRB-MDS
No
No
NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study (FHS)
phs000974.v5.p3.c2
topmed-FHS_HMB-IRB-NPU-MDS
No
No
NHLBI TOPMed: Heart and Vascular Health Study (HVH)
phs000993.v5.p2.c1
topmed-HVH_HMB-IRB-MDS
No
No
NHLBI TOPMed: Heart and Vascular Health Study (HVH)
phs000993.v5.p2.c2
topmed-HVH_DS-CVD-IRB-MDS
No
No
NHLBI TOPMed - NHGRI CCDG: The Vanderbilt AF Ablation Registry
phs000997.v5.p2.c1
topmed-VAFAR_HMB-IRB
No
No
NHLBI TOPMed: Heart and Vascular Health Study (HVH)
phs001032.v6.p2.c1
topmed-VU_AF_GRU-IRB
No
No
NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados
phs001143.v4.p1.c1
topmed-BAGS_GRU-IRB
No
No
NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation (CCAF) Study
phs001189.v4.p1.c1
topmed-CCAF_AF_GRU-IRB
No
No
NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (CHS)
phs001368.v3.p2.c1
topmed-CHS_HMB-MDS
No
No
NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (CHS)
phs001368.v3.p2.c2
topmed-CHS_HMB-NPU-MDS
No
No
NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (CHS)
phs001368.v3.p2.c4
topmed-CHS_DS-CVD-NPU-MDS
No
No
NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AACAC)
phs001412.v3.p1.c1
topmed-AACAC_HMB-IRB-COL-NPU
No
No
NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AACAC)
phs001412.v3.p1.c2
topmed-AACAC_DS-DHD-IRB-COL-NPU
No
No
NHLBI TOPMed: MESA and MESA Family AA-CAC (MESA)
phs001416.v3.p1.c1
topmed-MESA_HMB
No
No
NHLBI TOPMed: MESA and MESA Family AA-CAC (MESA)
phs001416.v3.p1.c2
topmed-MESA_HMB-NPU
No
No
Clinical-trial of COVID-19 Convalescent Plasma in Outpatients (C3PO)
phs002752.v1.p1.c1
COVID19-C3PO_GRU
No
No
COVID-19 Post-hospital Thrombosis Prevention Study (ACTIV-4C)
phs003063.v1.p1.c1
COVID19-ACTIV4C_GRU
No
No
Multi-Ethnic Study of Atherosclerosis (BioLINCC)
phs003288.v1.p1.c1
BioLINCC-MESA_HMB
Yes
Yes
Multi-Ethnic Study of Atherosclerosis (BioLINCC)
phs003288.v1.p1.c2
BioLINCC-MESA_HMB-NPU
Yes
Yes
RECOVER Synthetic Data Set
tutorial-RECOVER_synthetic_data_set_1
tutorial-RECOVER_synthetic_data_set_1
Yes
Yes
NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study (CHS)
phs001368.v3.p2.c3
topmed-CHS_DS-NPU-MDS
Yes
Yes
NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica
phs000988.v5.p1.c1
topmed-CRA_DS-ASTHMA-IRB-MDS-RD
No
Yes
NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)
phs000920.v5.p3.c2
topmed-GALAII_DS-LD-IRB-COL
No
Yes
NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy
phs001293.v3.p1.c2
topmed-HyperGEN_DS-CVD-IRB-RD
No
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)
phs002910.v1.p1.c1
C4R-COPDGene_HMB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)
phs002910.v1.p1.c2
C4R-COPDGene_DS-CS
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Atherosclerosis Risk in Communities Study (ARIC)
phs002988.v1.p1.c1
C4R-ARIC_HMB-IRB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Atherosclerosis Risk in Communities Study (ARIC)
phs002988.v1.p1.c2
C4R-ARIC_DS-CVD-IRB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
phs002913.v1.p1.c1
C4R-SARP_GRU-PUB-NPU
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
phs002913.v1.p1.c2
C4R-SARP_GRU-PUB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
phs002913.v1.p1.c3
C4R-SARP_DS-AAI-PUB-NPU
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
phs002913.v1.p1.c4
C4R-SARP_DS-AAI-PUB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Framingham Heart Study (FHS)
phs002911.v1.p1.c1
C4R-FHS_HMB-IRB-MDS
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Framingham Heart Study (FHS)
phs002911.v1.p1.c2
C4R-FHS_HMB-IRB-NPU-MDS
Yes
Yes
ApoA-1 and Atherosclerosis in Psoriasis (DIR)
phs003231.v1.p1.c1
DIR-AAP_GRU
Yes
Yes
Method to Assess Lung Water Accumulation During Exercise (DIR)
phs003346.v1.p1.c1
DIR-MALWADE_GRU-IRB
Yes
Yes
Heart Failure Network: Diuretic Optimization Strategies Evaluation in Acute Heart Failure (HFN DOSE-BioLINCC)
phs003524.v1.p1.c1
BioLINCC-BL_HFN_DOSE_AHF_GRU
Yes
No
National Sleep Research Resource (NSRR): Hispanic Community Health Study/Study of Latinos
phs003543.v1.p1.c1
NSRR-HCHS_HMB-NPU
Yes
No
National Sleep Research Resource (NSRR): Hispanic Community Health Study/Study of Latinos
phs003543.v1.p1.c2
NSRR-HCHS_HMB
Yes
No
NIH RECOVER: A Multi-Site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults
phs003463.v1.p1.c1
RECOVER-RC_Adult_GRU
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Prevent Pulmonary Fibrosis (PrePF)
phs002975.v1.p1.c1
COVID19-C4R_PREPF_DS-PMD-IRB
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)
phs002910.v1.p1.c2
COVID19-C4R_COPDGENE_DS-CS
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)
phs002910.v1.p1.c1
COVID19-C4R_COPDGENE_HMB
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)
phs003017.v1.p1.c1
COVID19-C4R_MESA_HMB
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)
phs003017.v1.p1.c2
COVID19-C4R_MESA_HMB-NPU
Yes
No
NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)
phs001472.v2.p1.c1
topmed-ECLIPSE_DS-COPD-MDS-RD
No
Yes
NHGRI CCDG: Early-onset Atrial Fibrillation in the CATHeterization GENetics (CATHGEN) Cohort
phs001600.v3.p2.c1
topmed-CATHGEN_DS-CVD-IRB
No
Yes
NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy (GENOA)
phs001345.v3.p1.c1
topmed-GENOA_DS-ASC-RF-NPU
No
Yes
NHLBI TOPMed: Genetics of Lipid Lowering Drugs and Diet Network (GOLDN)
phs001359.v3.p1.c1
topmed-GOLDN_DS-CVD-IRB
No
Yes
NHLBI TOPMed: University of Massachusetts Medical School (UMMS) miRhythm Study
phs001434.v2.p1.c1
topmed-miRhythm_GRU
No
Yes
NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish
phs000956.v4.p1.c2
topmed-Amish_HMB-IRB-MDS
No
Yes
NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene) in the TOPMed Program
phs000951.v5.p4.c1
topmed-COPDGene_HMB
No
Yes
NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene) in the TOPMed Program
phs000951.v5.p4.c2
topmed-COPDGene_DS-CS-RD
No
Yes
NHLBI TOPMed: Trans-Omics for Precision Medicine Whole Genome Sequencing Project: ARIC
phs001211.v4.p2.c1
topmed-ARIC_HMB-IRB
No
Yes
NHLBI TOPMed: Trans-Omics for Precision Medicine Whole Genome Sequencing Project: ARIC
phs001211.v4.p2.c2
topmed-ARIC_DS-CVD-IRB
No
Yes
NHLBI TOPMed: REDS-III Brazil Sickle Cell Disease Cohort (REDS-BSCDC)
phs001468.v3.p1.c1
topmed-REDS-III_Brazil_SCD_GRU-IRB-PUB-NPU
No
Yes
NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica
phs000988.v5.p1.c1
topmed-CRA_DS-ASTHMA-IRB-MDS-RD
No
Yes
NHLBI TOPMed: Genes-environments and Admixture in Latino Asthmatics (GALA II) Study
phs000920.v5.p2.c2
topmed-GALAII_DS-LD-IRB-COL
No
Yes
LungMAP: Molecular Atlas of Lung Development - Human Lung Tissue
phs001961.v2.p1.c1
LungMAP-MALD_GRU
No
No
Unrelated Donor Reduced Intensity Bone Marrow Transplant for Children with Severe Sickle Cell Disease (BMT CTN-0601-BioLINCC)
phs003470.v1.p1.c1
BioLINCC-BMT_CTN-0601_GRU
No
No
NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy
phs001293.v3.p1.c2
topmed-HyperGEN_DS-CVD-IRB-RD
No
Yes
NHLBI TOPMed - NHGRI CCDG: AF Biobank LMU in the context of the MED Biobank LMU
phs001543.v2.p1.c1
topmed-AFLMU_HMB-IRB-PUB-COL-NPU-MDS
No
Yes
NHLBI TOPMed: Australian Familial Atrial Fibrillation Study
phs001435.v2.p1.c1
topmed-AustralianFamilialAF_HMB-NPU-MDS
No
Yes
NHLBI TOPMed - NHGRI CCDG: Penn Medicine BioBank Early Onset Atrial Fibrillation Study
phs001601.v2.p1.c1
topmed-CCDG_PMBB_AF_HMB-IRB-PUB
No
Yes
NHLBI TOPMed: Children's Health Study (CHS) Integrative Genetic Approaches to Gene-Air Pollution Interactions in Asthma (GAP)
phs001602.v2.p1.c1
topmed-ChildrensHS_GAP_GRU
No
Yes
NHLBI TOPMed: Children's Health Study (CHS) Integrative Genomics and Environmental Research of Asthma (IGERA)
phs001603.v2.p1.c1
topmed-ChildrensHS_IGERA_GRU
No
Yes
NHLBI TOPMed: Children's Health Study (CHS) Effects of Air Pollution on the Development of Obesity in Children (Meta-AIR)
phs001604.v2.p1.c1
topmed-ChildrensHS_MetaAir_GRU
No
Yes
NHLBI TOPMed: Chicago Initiative to Raise Asthma Health Equity (CHIRAH)
phs001605.v2.p1.c2
topmed-CHIRAH_DS-ASTHMA-IRB-COL
No
Yes
NHLBI TOPMed: Determining the association of chromosomal variants with non-PV triggers and ablation-outcome in AF (DECAF)
phs001546.v2.p1.c1
topmed-DECAF_GRU
No
Yes
NHLBI TOPMed: Early-onset Atrial Fibrillation in the Estonian Biobank
phs001606.v2.p1.c1
topmed-EGCUT_GRU
No
Yes
NHLBI TOPMed: Genetics of Asthma in Latino Americans (GALA)
phs001542.v2.p1.c2
topmed-GALA_DS-LD-IRB-COL
No
Yes
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c3
topmed-pharmHU_DS-SCD
No
Yes
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c2
topmed-pharmHU_DS-SCD-RD
No
Yes
NHLBI TOPMed - NHGRI CCDG: The GENetics in Atrial Fibrillation (GENAF) Study
phs001547.v2.p1.c1
topmed-GENAF_HMB-NPU
No
Yes
NHLBI TOPMed: Genetic Study of Atherosclerosis Risk (GeneSTAR)
phs001218.v3.p1.c2
topmed-GeneSTAR_DS-CVD-IRB-NPU-MDS
No
Yes
NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity (GenSalt)
phs001217.v3.p1.c1
topmed-GenSalt_DS-HCR-IRB
No
Yes
NHLBI TOPMed - NHGRI CCDG: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)
phs001395.v2.p1.c2
topmed-HCHS-SOL_HMB
No
Yes
NHLBI TOPMed - NHGRI CCDG: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)
phs001395.v2.p1.c1
topmed-HCHS-SOL_HMB-NPU
No
Yes
NHLBI TOPMed: HyperGEN - Genetics of Left Ventricular (LV) Hypertrophy
phs001293.v3.p1.c1
topmed-HyperGEN_GRU-IRB
No
Yes
NHLBI TOPMed - NHGRI CCDG: Intermountain INSPIRE Registry
phs001545.v2.p1.c1
topmed-INSPIRE_AF_DS-MULTIPLE_DISEASES-MDS
No
Yes
NHLBI TOPMed - NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study
phs001598.v2.p1.c1
topmed-JHU_AF_HMB-NPU-MDS
No
Yes
NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)
phs001402.v3.p1.c1
topmed-Mayo_VTE_GRU
No
Yes
NHLBI TOPMed - NHGRI CCDG: Massachusetts General Hospital (MGH) Atrial Fibrillation Study
phs001062.v5.p2.c2
topmed-MGH_AF_DS-AF-IRB-RD
No
Yes
NHLBI TOPMed - NHGRI CCDG: Massachusetts General Hospital (MGH) Atrial Fibrillation Study
phs001062.v5.p2.c1
topmed-MGH_AF_HMB-IRB
No
Yes
NHLBI TOPMed: MyLifeOurFuture (MLOF) Research Repository of patients with hemophilia A (factor VIII deficiency) or hemophilia B (factor IX deficiency)
phs001515.v2.p1.c1
topmed-MLOF_HMB-PUB
No
Yes
NHLBI TOPMed - NHGRI CCDG: Malmo Preventive Project (MPP)
phs001544.v2.p1.c1
topmed-MPP_HMB-NPU-MDS
No
Yes
NHLBI TOPMed: Partners HealthCare Biobank
phs001024.v5.p1.c1
topmed-PARTNERS_HMB
No
Yes
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c1
topmed-pharmHU_HMB
No
Yes
NHLBI TOPMed: San Antonio Family Heart Study (SAFHS)
phs001215.v4.p2.c1
topmed-SAFHS_DS-DHD-IRB-PUB-MDS-RD
No
Yes
NHLBI TOPMed: Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE)
phs001467.v2.p1.c1
topmed-SAPPHIRE_asthma_DS-ASTHMA-IRB-COL
No
Yes
NHLBI TOPMed: African American Sarcoidosis Genetics Resource
phs001207.v3.p1.c1
topmed-Sarcoidosis_DS-SAR-IRB
No
Yes
NHLBI TOPMed: Genome-Wide Association Study of Adiposity in Samoans
phs000972.v5.p1.c1
topmed-SAS_GRU-IRB-PUB-COL-NPU-GSO
No
Yes
NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese (THRV)
phs001387.v3.p1.c3
topmed-THRV_DS-CVD-IRB-COL-NPU-RD
No
Yes
NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women
phs001040.v5.p1.c1
topmed-WGHS_HMB
No
Yes
NHLBI TOPMed: Women's Health Initiative (WHI)
phs001237.v3.p1.c1
topmed-WHI_HMB-IRB
No
Yes
NHLBI TOPMed: Women's Health Initiative (WHI)
phs001237.v3.p1.c2
topmed-WHI_HMB-IRB-NPU
No
Yes
NHLBI TOPMed: Severe Asthma Research Program (SARP)
phs001446.v3.p2.c1
topmed-SARP_GRU
Yes
No
NHLBI TOPMed: Severe Asthma Research Program (SARP)
phs001446.v3.p2.c2
topmed-SARP_DS-AAI-PUB
Yes
No
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c2
topmed-pharmHU_DS-SCD-RD
Yes
No
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c3
topmed-pharmHU_DS-SCD
Yes
No
Cardiovascular Health Study (CHS) - Imaging
phs003639.v1.p1.c1
imaging-img_CHS_HMB-MDS
Yes
No
Cardiovascular Health Study (CHS) - Imaging
phs003639.v1.p1.c2
imaging-img_CHS_HMB-NPU-MDS
Yes
No
Cardiovascular Health Study (CHS) - Imaging
phs003639.v1.p1.c3
imaging-img_CHS_DS-CVD-MDS
Yes
No
Cardiovascular Health Study (CHS) - Imaging
phs003639.v1.p1.c4
imaging-img_CHS_DS-CVD-NPU-MDS
Yes
No
Multi-Ethnic Study of Atherosclerosis (Electrocardiogram Tracing Repository)
phs003703.v1.p1.c1
imaging-img_MESA_ECG_HMB
Yes
No
Multi-Ethnic Study of Atherosclerosis (Electrocardiogram Tracing Repository)
phs003703.v1.p1.c2
imaging-img_MESA_ECG_HMB-NPU
Yes
No
Sleep Heart Health Study (SHHS-BioLINCC)
phs003637.v1.p1.c1
BioLINCC-BL_SHHS_HMB-MDS
No
No
NHLBI TOPMed: Boston Early-Onset COPD Study
phs000946.v6.p2.c1
topmed-EOCOPD_DS-CS-RD
No
Yes
NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation (CCAF) Study
phs001189.v5.p1.c1
topmed-CCAF_AF_GRU-IRB
No
Yes
NHLBI TOPMed: NHGRI CCDG: AF Biobank LMU in the context of the MED Biobank LMU
phs001543.v3.p1.c1
topmed-AFLMU_HMB-IRB-PUB-COL-NPU-MDS
No
Yes
NHLBI TOPMed - NHGRI CCDG: The GENetics in Atrial Fibrillation (GENAF) Study
phs001547.v3.p1.c1
topmed-GENAF_HMB-NPU
No
Yes
NHLBI TOPMed: Early-Onset Atrial Fibrillation in the Estonian Biobank
phs001606.v3.p1.c1
topmed-EGCUT_GRU
No
Yes
NHLBI TOPMed: NHGRI CCDG: The BioMe Biobank at Mount Sinai
phs001644.v3.p2.c1
topmed-BioMe_HMB-NPU
No
Yes
NHLBI TOPMed: Childhood Asthma Management Program (CAMP)
phs001726.v3.p1.c1
topmed-CAMP_DS-AST-COPD
No
Yes
Human Liver Cohort (HLC)
phs000253.v1.p1.c1
heartfailure-HLC_GRU
Yes
No
NHLBI Exome Sequencing in SCID
phs000479.v1.p1.c1
heartfailure-Exome_SCID_GRU
Yes
No
Familial Exome Sequencing in Rare Pediatric Phenotypes
phs000553.v1.p1.c1
heartfailure-FamExome_RarePeds_GRU-MDS
Yes
No
PCGC: Congenital Heart Disease Genetic Network Study
phs000571.v6.p2.c2
PCGC-CHD-GENES_DS-CHD
Yes
No
NHLBI GO-ESP: Family Studies (Mendelian Lipid Disorders)
phs000587.v1.p1.c1
heartfailure-Fam_MLD_DS-CLA
Yes
No
NextGen Consortium: iPS Derived Hepatocytes Study (PhLiPS Study)
phs001341.v1.p1.c1
heartfailure-PhLiPS_GRU
Yes
No
Myocardial Applied Genomics Network (MAGNet) Study
phs001539.v4.p1.c1
heartfailure-MAGNet_HMB-MDS
Yes
No
Cardiovascular ATVB: Atherosclerosis Thrombosis and Vascular Biology
phs001592.v1.p1.c1
heartfailure-CardioATVB_DS-CVD
Yes
No
Profiles of exRNA in CSF and Plasma from Subarachnoid Hemorrhage Patients
phs001759.v1.p1.c1
heartfailure-exRNA_CSF_HMB
Yes
No
miRNA Profiling of Maternal and Non-Maternal Healthy Adult Blood Plasma Using Small RNA-Sequencing
phs001892.v1.p1.c1
heartfailure-miRNA_Maternal_Plasma_GRU
Yes
No
NHLBI TOPMed: NHGRI CCDG: UCSF Atrial Fibrillation Study
phs001933.v2.p1.c1
topmed-UCSF_Afib_HMB-MDS
Yes
No
NIH RECOVER-Pediatric: Understanding the Long-Term Impact of COVID on Children and Families
phs003461.v1.p1.c1
RECOVER-RC_Pediatrics_GRU
Yes
No
REDS-IV-P Epidemiology, Surveillance and Preparedness of the Novel SARS-CoV-2 Epidemic (RESPONSE)
phs003578.v1.p1.c1
REDS-RESPONSE_GRU
Yes
No
Sudden Cardiac Death in Heart Failure Trial (SCD-HeFT-BioLINCC)
phs003654.v1.p1.c1
BioLINCC-BL_SCD-HeFT_GRU
Yes
No
NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)
phs001472.v3.p2.c1
topmed-ECLIPSE_DS-CS-MDS-RD
No
Yes
NHLBI TOPMed: Characterizing the Response to a Leukotriene Receptor Antagonist and an Inhaled Corticosteroid (CLIC)
phs001729.v3.p1.c2
topmed-CARE_CLIC_DS-ASTHMA-IRB-COL
No
No
TRanscriptomic ANalySis of left ventriCulaR gene Expression (TRANSCRibE)
phs001679.v1.p1.c1
heartfailure-TRANSCRibE_GRU
Yes
No
TRanscriptomic ANalySis of left ventriCulaR gene Expression (TRANSCRibE)
phs001679.v1.p1.c2
heartfailure-TRANSCRibE_DS-CI
Yes
No
Molecular Genetics of Heterotaxy and Related Congenital Heart Defects
phs001814.v1.p1.c1
heartfailure-MolGen_CHD_GRU
Yes
No
NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)
phs001402.v3.p1.c1
topmed-Mayo_VTE_GRU
No
Yes
NHLBI TOPMed: My Life Our Future (MLOF) Research Repository of Patients with Hemophilia A (Factor VIII Deficiency) or Hemophilia B (Factor IX Deficiency)
phs001515.v2.p2.c1
topmed-MLOF_HMB-PUB
No
Yes
NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study
phs001368.v4.p2.c1
topmed-CHS_HMB-NPU-MDS
No
Yes
NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study
phs001368.v4.p2.c2
topmed-CHS_HMB-MDS
No
Yes
NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study
phs001368.v4.p2.c3
topmed-CHS_DS-CVD-NPU-MDS
No
Yes
NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study
phs001368.v4.p2.c4
topmed-CHS_DS-CVD-MDS
No
Yes
NHLBI TOPMed: San Antonio Family Heart Study (SAFHS)
phs001215.v4.p2.c1
topmed-SAFHS_DS-DHD-IRB-PUB-MDS-RD
No
Yes
NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica
phs000988.v6.p1.c1
topmed-CRA_DS-ASTHMA-IRB-MDS-RD
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c1
topmed-IPF_DS-ILD-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c2
topmed-IPF_DS-LD-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c3
topmed-IPF_DS-PFIB-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c4
topmed-IPF_DS-PUL-ILD-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c5
topmed-IPF_HMB-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c6
topmed-IPF_DS-LD-IRB-COL-NPU
No
Yes
NIH RECOVER: A Multi-Site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults
phs003463.v2.p2.c1
RECOVER-RC_Adult_GRU
No
Yes
NHLBI TOPMed: Women's Health Initiative (WHI)
phs001237.v3.p1.c1
topmed-WHI_HMB-IRB
No
Yes
NHLBI TOPMed: Women's Health Initiative (WHI)
phs001237.v3.p1.c2
topmed-WHI_HMB-IRB-NPU
No
Yes
NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study
phs000974.v5.p4.c1
topmed-FHS_HMB-IRB-MDS
No
Yes
NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study
phs000974.v5.p4.c2
topmed-FHS_HMB-IRB-NPU-MDS
No
Yes
NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)
phs001211.v4.p3.c1
topmed-ARIC_HMB-IRB
No
Yes
NHLBI TOPMed - NHGRI CCDG: Atherosclerosis Risk in Communities (ARIC)
phs001211.v4.p3.c2
topmed-ARIC_DS-CVD-IRB
No
Yes
NHLBI TOPMed: MESA and MESA Family AA-CAC
phs001416.v3.p1.c1
topmed-MESA_HMB
No
Yes
NHLBI TOPMed: MESA and MESA Family AA-CAC
phs001416.v3.p1.c2
topmed-MESA_HMB-NPU
No
Yes
NHLBI TOPMed: Pediatric Cardiac Genomics Consortium (PCGC)'s Congenital Heart Disease Biobank
phs001735.v2.p1.c1
topmed-PCGC_HMB
No
Yes
NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment (SAGE)
phs000921.v5.p2.c2
topmed-SAGE_DS-LD-IRB-COL
No
Yes
NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)
phs000920.v6.p4.c2
topmed-GALAII_DS-LD-IRB-COL
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c1
topmed-IPF_HMB-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c2
topmed-IPF_DS-LD-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c3
topmed-IPF_DS-ILD-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c4
topmed-IPF_DS-PFIB-IRB-NPU
No
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607.v3.p2.c5
topmed-IPF_DS-PUL-ILD-IRB-NPU
No
Yes
The Collaborative Cohort of Cohorts for COVID-19 Research (C4R)
phs003045.v1.p1.c1
COVID19-C4R_CARDIA_HMB
Yes
No
The Collaborative Cohort of Cohorts for COVID-19 Research (C4R)
phs003045.v1.p1.c2
COVID19-C4R_CARDIA_HMB-NPU
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)
phs002909.v1.p1.c1
COVID19-C4R_SPIROMICS_GRU
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)
phs002909.v1.p1.c2
COVID19-C4R_SPIROMICS_GRU_NPU
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)
phs002909.v1.p1.c3
COVID19-C4R_SPIROMICS_COPD
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)
phs002909.v1.p1.c4
COVID19-C4R_SPIROMICS_COPD_NPU
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)
phs002909.v1.p1.c5
COVID19-C4R_SPIROMICS_GRU_COL
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)
phs002909.v1.p1.c6
COVID19-C4R_SPIROMICS_GRU-NPU-COL
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)
phs002909.v1.p1.c7
COVID19-C4R_SPIROMICS_COPD-COL
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Subpopulations and Intermediate Outcome Measures in COPD Study (SPIROMICS)
phs002909.v1.p1.c8
COVID19-C4R_SPIROMICS_COPD-NPU-COL
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)
phs003017.v1.p1.c1
COVID19-C4R_MESA_HMB
Yes
No
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)
phs003017.v1.p1.c2
COVID19-C4R_MESA_HMB-NPU
Yes
No
NHLBI TOPMed: Partners HealthCare Biobank
phs001024.v6.p1.c1
topmed-PARTNERS_HMB
No
Yes
NHLBI TOPMed: Novel Risk Factors
phs001040.v6.p1.c1
topmed-WGHS_HMB
No
Yes
NHLBI TOPMed: Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE)
phs001467.v2.p2.c1
topmed-SAPPHIRE_asthma_HMB-COL
No
Yes
NHLBI TOPMed: Walk-PHaSST Sickle Cell Disease (SCD)
phs001514.v2.p1.c1
topmed-Walk_PHaSST_SCD_HMB-IRB-PUB-COL-NPU-MDS-GSO
No
Yes
NHLBI TOPMed: Walk-PHaSST Sickle Cell Disease (SCD)
phs001514.v2.p1.c2
otopmed-Walk_PHaSST_SCD_DS-SCD-IRB-PUB-COL-NPU-MDS-RDN
No
Yes
NHLBI TOPMed - NHGRI CCDG: Malmo Preventive Project (MPP)
phs001544.v3.p1.c1
topmed-MPP_HMB-NPU-MDS
No
Yes
NHLBI TOPMed - NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study
phs001598.v3.p1.c1
topmed-JHU_AF_HMB-NPU-MDS
No
Yes
NHLBI TOPMed: Outcome Modifying Genes in Sickle Cell Disease (OMG)
phs001608.v2.p1.c1
topmed-OMG_SCD_DS-SCD-IRB-PUB-COL-MDS-RD
No
Yes
NHLBI TOPMed - NHGRI CCDG: The Vanderbilt University BioVU Atrial Fibrillation Genetics Study
phs001624.v3.p2.c1
topmed-BioVU_AF_HMB-GSO
No
Yes
NHLBI TOPMed: Genetic Causes of Complex Pediatric Disorders - Asthma (GCPD-A)
phs001661.v3.p1.c1
topmed-GCPD-A_DS-ASTHMA-GSO
No
Yes
NHLBI TOPMed: Lung Tissue Research Consortium (LTRC)
phs001662.v2.p1.c2
topmed-LTRC_HMB-MDS
No
Yes
NHLBI TOPMed: Pulmonary Hypertension and the Hypoxic Response in SCD (PUSH)
phs001682.v2.p1.c1
topmed-PUSH_SCD_DS-SCD-IRB-PUB-COL
No
Yes
NHLBI TOPMed - NHGRI CCDG: Groningen Genetics of Atrial Fibrillation (GGAF) Study
phs001725.v2.p1.c1
topmed-GGAF_GRU
No
Yes
NHLBI TOPMed: Childhood Asthma Management Program (CAMP)
phs001726.v2.p1.c1
topmed-CAMP_DS-AST-COPD
No
Yes
NHLBI TOPMed: Best ADd-on Therapy Giving Effective Response (BADGER)
phs001728.v3.p1.c2
topmed-CARE_BADGER_DS-ASTHMA-IRB-COL
No
Yes
NHLBI TOPMed: Characterizing the Response to a Leukotriene Receptor Antagonist and an Inhaled Corticosteroid (CLIC)
phs001729.v3.p1.c2
topmed-CARE_CLIC_DS-ASTHMA-IRB-COL
No
Yes
NHLBI TOPMed: Pediatric Asthma Controller Trial (PACT)
phs001730.v2.p1.c2
topmed-CARE_PACT_DS-ASTHMA-IRB-COL
No
Yes
NHLBI TOPMed: TReating Children to Prevent EXacerbations of Asthma (TREXA)
phs001732.v2.p1.c2
topmed-CARE_TREXA_DS-ASTHMA-IRB-COL
No
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Hispanic Community Health Study/Study of Latinos (HCHS/SOL)
phs002908.v1.p1.c1
COVID19-C4R_HCHS_SOL_HMB-NPU
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Hispanic Community Health Study/Study of Latinos (HCHS/SOL)
phs002908.v1.p1.c2
COVID19-C4R_HCHS_SOL_HMB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)
phs003017.v1.p1.c1
COVID19-C4R_MESA_HMB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Multi-Ethnic Study of Atherosclerosis (MESA)
phs003017.v1.p1.c2
COVID19-C4R_MESA_HMB-NPU
Yes
Yes
NIH RECOVER: A Multi-Site Observational Study of Post-Acute Sequelae of SARS-CoV-2 Infection in Adults
phs003463.v2.p2.c1
RECOVER-RC-Adult_GRU
No
Yes
Heart Failure Network: Functional Impact of GLP-1 for Heart Failure Treatment (HFN FIGHT-BioLINCC)
phs003542.v1.p1.c1
BioLINCC_BL_HFN-FIGHT_GRU
No
Yes
Action to Control Cardiovascular Risk in Diabetes (ACCORD-BioLINCC)
phs003551.v1.p1.c1
BioLINCC-BL_ACCORD_GRU
No
Yes
Action to Control Cardiovascular Risk in Diabetes (ACCORD - Imaging)
phs003562.v2.p1.c1
imaging-ACCORD_GRU
No
Yes
Systolic Blood Pressure Intervention Trial (SPRINT-Imaging)
phs003566.v2.p1.c1
imaging-SPRINT_GRU
No
Yes
Framingham Heart Study-Cohort (FHS-Cohort) - Imaging
phs003593.v1.p1.c1
Imaging-img_FHS_HMB-IRB-MDS
No
Yes
Framingham Heart Study-Cohort (FHS-Cohort) - Imaging
phs003593.v1.p1.c2
Imaging-img_FHS_HMB-IRB-NPU-MDS
No
Yes
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c1
topmed-pharmHU_HMB
No
Yes
HLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c2
topmed-pharmHU_DS-SCD-RD
No
Yes
NHLBI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466.v2.p1.c3
topmed-pharmHU_DS-SCD
No
Yes
NHLBI TOPMed: Partners HealthCare Biobank
phs001024.v6.p1.c1
topmed-PARTNERS_HMB
No
Yes
NHLBI TOPMed - NHGRI CCDG: The Vanderbilt University BioVU Atrial Fibrillation Genetics Study
phs001624.v3.p2.c1
topmed-BioVU_AF_HMB-GSO
No
Yes
NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women
phs001040.v6.p1.c1
topmed-WGHS_HMB
No
Yes
NHLBI TOPMed - NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study
phs001598.v3.p1.c1
topmed-JHU_AF_HMB-NPU-MDS
No
Yes
NHLBI TOPMed - NHGRI CCDG: Malmo Preventive Project (MPP)
phs001544.v3.p1.c1
topmed-MPP_HMB-NPU-MDS
No
Yes
NHLBI TOPMed: Pathways to Immunologically Mediated Asthma (PIMA)
phs001727.v3.p1.c2
topmed-PIMA_DS-ASTHMA-IRB-COL
No
Yes
NHLBI TOPMed: Characterizing the Response to a Leukotriene Receptor Antagonist and an Inhaled Corticosteroid (CLIC)
phs001729.v3.p1.c2
topmed-CARE_CLIC_DS-ASTHMA-IRB-COL
No
Yes
NHLBI TOPMed: Best ADd-on Therapy Giving Effective Response (BADGER)
phs001728.v3.p1.c2
topmed-CARE_BADGER_DS-ASTHMA-IRB-COL
No
Yes
Guiding Evidence Based Therapy Using Biomarker Intensified Treatment in Heart Failure (GUIDE-IT-BioLINCC)
phs003621.v1.p1.c1
BioLINCC-BL_GUIDE-IT_GRU
Yes
Yes
Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION-BioLINCC)
phs003599.v1.p1.c1
BioLINCC-BL_HF-ACTION_HMB
Yes
Yes
Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION-BioLINCC)
phs003599.v1.p1.c2
BioLINCC-BL_HF-ACTION_HMB-NPU
Yes
Yes
Sleep Heart Health Study (SHHS-BioLINCC)
phs003637.v1.p1.c1
BioLINCC-BL_SHHS_HMB-MDS
Yes
Yes
Accelerating COVID-19 Therapeutic Interventions and Vaccines 4 ACUTE (ACTIV4a) v1.0, v1.1
phs002694.v3.p1.c1
COVID19-ACTIV4A_GRU
No
Yes
COVID-19 Post-hospital Thrombosis Prevention Study (ACTIV4c)
phs003063.v1.p1.c1
COVID19-ACTIV4C_GRU
No
Yes
Molecular Atlas of Lung Development (LungMAP)
phs001961.v2.p1.c1
LungMAP-MALD_GRU
Yes
Yes
Complement Inhibition Using Eculizumab to Overcome Platelet Transfusion Refractoriness in Patients with Severe Thrombocytopenia (DIR-Eculizumab)
phs003212.v1.p1.c1
DIR-Eculizumab_GRU
Yes
Yes
Hydroxyurea to Prevent Organ Damage in Children with Sickle Cell Anemia (BABYHUG)
phs002415.v1.p1.c1
BioLINCC-BabyHug_DS-SCD-IRB-RD
No
No
The Genetic Epidemiology of Asthma in Costa Rica (CRA)
phs000988.v4.p1.c1
topmed-CRA_DS-ASTHMA-IRB-MDS-RD
No
Yes
Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b)
phs002808.v1.p1.c1
topmed-NuMom2B_GRU-IRB
Yes
Yes
Multicenter Study of Hydroxyurea (MSH)
phs002348.v1.p1.c1
BioLINCC-MSH_GRU
No
No
The Cleveland Family Study (NSRR-CFS)
phs002715.v1.p1.c1
NSRR-NSRR-CFS_DS-HLBS-IRB-NPU
No
No
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
NHLBI TOPMed: Boston Early-Onset COPD Study in the TOPMed Program (EOCOPD)
phs000946.v5.p1.c1
topmed-EOCOPD_DS-CS-RD
No
Yes
NHLBI TOPMed: The Cleveland Family Study (CFS)
phs000954.v4.p2.c1
topmed-CFS_DS-HLBS-IRB-NPU
No
Yes
NHLBI TOPMed: The Jackson Heart Study (JHS)
phs000964.v5.p1.c1
topmed-JHS_HMB-IRB-NPU
No
Yes
NHLBI TOPMed: The Jackson Heart Study (JHS)
phs000964.v5.p1.c2
topmed-JHS_DS-FDO-IRB-NPU
No
Yes
NHLBI TOPMed: The Jackson Heart Study (JHS)
phs000964.v5.p1.c3
topmed-JHS_HMB-IRB
No
Yes
NHLBI TOPMed: The Jackson Heart Study (JHS)
phs000964.v5.p1.c4
topmed-JHS_DS-FDO-IRB
No
Yes
NHLBI TOPMed: Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study (FHS)
phs000974.v5.p3.c1
topmed-FHS_HMB-IRB-MDS
No
Yes
NHLBI TOPMed: Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study (FHS)
phs000974.v5.p3.c2
topmed-FHS_HMB-IRB-NPU-MDS
No
Yes
NHLBI TOPMed: Heart and Vascular Health Study (HVH)
phs000993.v5.p2.c1
topmed-HVH_HMB-IRB-MDS
No
Yes
NHLBI TOPMed: Heart and Vascular Health Study (HVH)
phs000993.v5.p2.c2
topmed-HVH_DS-CVD-IRB-MDS
No
Yes
NHLBI TOPMed: The Vanderbilt AF Ablation Registry (VAFAR)
phs000997.v5.p2.c1
topmed-VAFAR_HMB-IRB
No
Yes
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry (VU)
phs001032.v6.p2.c1
topmed-VU_AF_GRU-IRB
No
Yes
NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados (BAGS)
phs001143.v4.p1.c1
topmed-BAGS_GRU-IRB
No
Yes
NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation Study (CCAF)
phs001189.v4.p1.c1
topmed-CCAF_AF_GRU-IRB
No
Yes
NHLBI TOPMed: Cardiovascular Health Study (CHS)
phs001368.v3.p2.c1
topmed-CHS_HMB-MDS
No
Yes
NHLBI TOPMed: Cardiovascular Health Study (CHS)
phs001368.v3.p2.c2
topmed-CHS_HMB-NPU-MDS
No
Yes
NHLBI TOPMed: Cardiovascular Health Study (CHS)
phs001368.v3.p2.c3
topmed-CHS_DS-CVD-MDS
Yes
Yes
NHLBI TOPMed: Cardiovascular Health Study (CHS)
phs001368.v3.p2.c4
topmed-CHS_DS-CVD-NPU-MDS
No
Yes
NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AACAC)
phs001412.v3.p1.c1
topmed-AACAC_HMB-IRB-COL-NPU
No
Yes
NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AACAC)
phs001412.v3.p1.c2
topmed-AACAC_DS-DHD-IRB-COL-NPU
No
Yes
NHLBI TOPMed: MESA and MESA Family AA-CAC (MESA)
phs001416.v2.p1.c1
topmed-MESA_HMB
No
Yes
NHLBI TOPMed: MESA and MESA Family AA-CAC (MESA)
phs001416.v2.p1.c2
topmed-MESA_HMB-NPU
No
Yes
Clinical-trial of COVID-19 Convalescent Plasma in Outpatients (C3PO)
phs002752.v1.p1.c1
COVID19-C3PO_GRU
No
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)
phs002910.v1.p1.c1
COVID19-C4R_COPDGene_HMB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Genetic Epidemiology of COPD Study (COPDGene)
phs002910.v1.p1.c2
COVID19-C4R_COPDGene_DS-CS
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Atherosclerosis Risk in Communities Study (ARIC)
phs002988.v1.p1.c1
COVID19-C4R_ARIC_HMB-IRB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Framingham Heart Study (FHS)
phs002911.v1.p1.c1
COVID19-C4R_FHS_HMB-IRB-MDS
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Framingham Heart Study (FHS)
phs002911.v1.p1.c2
COVID19-C4R_FHS_HMB-IRB-NPU-MDS
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
phs002913.v1.p1.c1
COVID19-C4R_GRU-PUB-NPU
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
phs002913.v1.p1.c2
COVID19-C4R_GRU-PUB
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
phs002913.v1.p1.c3
COVID19-C4R_DS-AAI-PUB-NPU
Yes
Yes
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
phs002913.v1.p1.c4
COVID19-C4R_DS-AAI-PUB
Yes
Yes
Multi-Ethnic Study of Atherosclerosis (BioLINCC)
phs003288.v1.p1.c1
BioLINCC-MESA_HMB
Yes
Yes
Multi-Ethnic Study of Atherosclerosis (BioLINCC)
phs003288.v1.p1.c2
BioLINCC-MESA_HMB-NPU
Yes
Yes
COVID-19 ACTIV-4 ACUTE | phs002694.c1 | ACTIV4A_GRU | Yes | Yes |
COVID-19 Outpatient Thrombosis Prevention Trial | phs002710.c1 | ACTIV4B_GRU | Yes | Yes |
Freeze 9b batch 4 studies | various | various | No | No |
COVID-19-C3PO | phs002752.c1 | C3PO_GRU | Yes | Yes |
The Pediatric Cardiac Genomics Consortium (PCGC) | phs000571.v6.p2.c1 | PCGC-CHD-GENES_HMB | No | Yes |
The Collaborative Cohort of Cohorts for COVID-19 Research (C4R) | phs002988.v1.p1.c1 phs002910.v1.p1.c1 phs002910.v1.p1.c2 phs002911.v1.p1.c1 phs002911.v1.p1.c2 phs003017.v1.p1.c1 phs002919.v1.p1.c1 | C4R_ARIC_phs002988 C4R_COPDGene_phs002910 C4R_FHS_phs002911 C4R_MESA_phs003017 C4R_REGARDS_phs002919 | No | Yes |
Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b) | phs002339.v1.p1.c1 | topmed-NuMom2B_GRU-IRB | Yes | Yes |
BostonBrazil_SCD | phs001599 | topmed-BostonBrazil_SCD_HMB-IRB-COL | Yes |
PCGC | phs001735.c1 | topmed-PCGC_CHD_HMB | No | Yes |
PCGC | phs001735.c2 | topmed-PCGC_CHD_DS-CHD | No | Yes |
National Sleep Research Resource (NSRR) | phs002715-c1 | NSRR-CFS_DS-HLBS-IRB-NPU | Yes |
FHS_phs000974_TOPMed_WGS_freeze.9b | phs000974 | TOPMed_FHS | No | Yes |
PCGC SRA | phs000571.v6.p2 | PCGC-CHD-GENES_HMB | Yes |
National Sleep Research Resource (NSRR)
| phs002715-c1 | NSRR-CFS_DS-HLBS-IRB-NPU | No | No |
C3PO (COVID-19) | phs002752 | C3PO | true | 1 |
TOPMed Freeze 9 - Batch 3 | various | various | false | NA |
TOPMed Freeze 9 - Batch 4 | various | various | false | NA |
National Sleep Research Resource (NSRR) | phs002715 | NSRR-CFS | true | 1 |
SPIROMICS (topmed: phs001927) | phs001927 | SPIROMICS | true | 1 |
BostonBrazil_SCD (TOPMed - phs001599) | phs001599 | BostonBrazil_SCD | true | 1 |
TOPMed - PCGC (Version update) | phs001735 | PCGC | false | 2 |
PCGC SRA Data | phs000571 | true | 5 |
TOPMed Freeze 9 - WHI | various | various | false | NA |
MUSIC/CARING (COVID-19) | phs002770 | MUSIC/CARING | true | 1 |
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
PCGC SRA Data | phs000571 | True | 5 |
TOPMed Freeze 9 - Batch 3 (20 datasets included) | Various | Various | false | NA |
TOPMed Freeze 9 - Batch 4 (20 datasets included) | Various | Various | false | NA |
ACTIV-4A | phs002694 | ACTIV4A | True | 1 |
ACTIV-4B | phs002710 | ACTIV4B | True | 1 |
Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b) | phs002808.v1.p1.c1 | topmed-NuMom2B_GRU-IRB | Yes | Yes |
Hydroxyurea to Prevent Organ Damage in Children with Sickle Cell Anemia (BABY HUG) | phs002415.v1.p1.c1 | BioLINCC-BabyHug_DS-SCD-IRB-RD | No | No |
Multicenter Study of Hydroxyurea (MSH) | phs002348.v1.p1.c1 | BioLINCC-MSH_GRU | No | No |
The Cleveland Family Study (NSRR-CFS) | phs002715.v1.p1.c1 | NSRR-NSRR-CFS_DS-HLBS-IRB-NPU | No | No |
The Genetic Epidemiology of Asthma in Costa Rica (CRA) | phs000988.v4.p1.c1 | topmed-CRA_DS-ASTHMA-IRB-MDS-RD | No | Yes |
Long-Term Outcomes after the Multisystem Inflammatory Syndrome In Children (MUSIC) | phs002770 | - | Yes | Yes |
Accelerating COVID-19 Therapeutic Interventions and Vaccines 4 ACUTE (ACTIV4a) v1.0, v1.1 | phs002694.v1.p1.c1 | COVID19-ACTIV4A_GRU | No | Yes |
Molecular Atlas of Lung Development (LungMAP) | phs001961.v2.p1.c1 | - | Yes | Yes |
Freeze 9 version Updates: Batch 1 | - | - | No | Yes |
Study Name | phs I.D. # | Acronym | New to BioData Catalyst | New study version |
TOPMed Freeze 9 - Batch 1 (22 datasets included) | Various | Various | false | NA |
TOPMed Freeze 9 - Batch 2 (15 datasets included) | Various | Various | false | NA |
topmed-CATHGEN_DS-CVD-IRB | phs001600 | CATHGEN | True | 6 |
PETAL - RED CORAL (COVID-19) | phs002363 | RED_CORAL | True | 1 |
The 2021-04-02 release marks the fifth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., CWL tools for QC pipelines) along with documentation and tutorials to help new users get started on the system. This release also includes enhanced support for searching across documentation. Please find more detail on the new features and user support materials in the sections below.
The 2021-04-02 data release includes updates of CRAMs and unharmonized clinical files for 6 TOPMed studies previously hosted on BioData Catalyst. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format.
Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.
Documentation Search: BioData Catalyst users can now use Documentation Search to search across various types of documentation over the entire ecosystem. Favorite results can be saved in a folder and revisited later.
CWL tools for QC pipelines: Users can now find the following CWL tools for quality control of GWAS data in the Seven Bridges Public Apps Gallery:
Heterozygosity by sample - This UW-GAC tool calculates heterozygosity by sample.
Pedigree Check - This UW-GAC workflow checks expected relationships specified in a pedigree file against empirical kinship values from KING or PC-Relate.
Import files from Kids First Data Resource Center: Users can now access datasets from the Kids First Data Resource portal directly from BioData Catalyst Powered by Seven Bridges using DRS links. Users must have dbGaP approvals for the Kids First datasets in order to access the dataset on BioData Catalyst. In addition, users can import DRS links from open access datasets available via DRS servers.
PIC-SURE Data Access Dashboard: Users on PIC-SURE can now see a list of studies with data available in PIC-SURE. The Data Access Dashboard will show the study name, identifier, and the number of variables/samples present. Additionally the user can see if they have access to the study or click to a link where they can learn more about the study and request access to studies they are not yet authorized to use.
Query annotations for all SNVs and dbSNP INDELS in the Annotation Explorer: Users on Seven Bridges can now use the Annotation Explorer to interactively aggregate and filter all SNVs (over 8 billion variants) and publicly available INDELs from dbSNP using ~700 annotations. Variant grouping files can be created from the results and exported to a workspace for use in rare variant association testing. This database is available to all authenticated users of BioData Catalyst. See here for more information about how to use the Annotation Explorer.
Best Practices for Secure and FAIR workflows were published on Dockstore to help users developing containers and descriptor files for their bioinformatics pipelines.
The Gitbook guide to self-service onboarding to Terra has been revamped.
Created and published all materials from the Fellows Webinar 1 onboarding session including a recording of the session, all materials used, instructions, etc. Additional webinars will be posted in the coming weeks.
Launched a 3-part video tutorial series on workflows, which helps users, particularly Fellows that are new to the platform, gain more insight into how to best utilize workflows for their data analysis.
Part 1 - How to run a pre-configured workflow
Part 2 - How to configure and run a workflow from scratch
Part 3 - How to run downstream analysis (on the data that resulted from your workflow)
Published a blog post clearly walking researchers through how they can leverage free cloud credits from Google Cloud in Terra. Published a related blog post to the aforementioned one on free cloud credits through GCP. This post covers additional funding sources for covering researchers’ cloud costs, highlighting Google EDU providing up to $10,000 in coupons for supported research projects and the NIH STRIDES initiative. Further, Terra added a new support documentation article covering how the call caching feature in Cromwell can help users save time and money.
Started a new blog post series focused on highlighting papers that may be of interest to the BDCatalyst community. This first post covers a review paper about workflow systems from C. Titus Brown’s lab at UC Davis.
Published a blog post officially announcing that RStudio is available in Terra, and this includes a new video tutorial for getting up and running.
Uploaded a new video tutorial demonstrating the use of Terra for viral genomics by guiding the user through the COVID-19 workspace.
Published a blog post introducing a new feature for task-level checkpointing in workflows. This makes it possible to save intermediate outputs for a task and resume work from that point if the task gets interrupted. Full documentation of this checkpoint feature can be found here.
Uploaded a video on Broad’s BioIT 2020 Talk proposing a cross-domain, common data model built specifically to facilitate search and reuse.
The table below highlights which studies were included in the 2021-04-02 data release. CRAMs and unharmonized clinical files were updated for 6 TOPMed studies previously hosted on BioData Catalyst. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The data is now available for access across the entire ecosystem.
Gen3 release notes
PIC-SURE release notes
The 2020-10-23 release marks the third release for the NHLBI BioData Catalyst ecosystem. This release includes several new features along with documentation and tutorials (e.g., bringing your own data and tools) to help new users get started on the system. This release also includes enhanced support for querying annotations for TOPMed Freeze 8 variants in the Annotation Explorer, and querying combined phenotypic and genomic data in PIC-SURE. Please find more detail on the new features and user support materials in the sections below.
The 2020-10-23 data release includes the addition of both Parent and TOPMed studies. A total of 8 new Parent studies and their respective unharmonized clinical files were added. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 2 TOPMed studies new to BioData Catalyst. Additionally, 6 studies were updated to the latest version. These updates included new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The data is now available for access across the entire ecosystem. Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.
Form cohorts on Gen3 Exploration page and export to Seven Bridges workspace: Users can now export PFB (Portable Format for Bioinformatics) files from Gen3 (e.g., synthetic cohort files from multiple groups) to Seven Bridges.
CWL workflows for EPACTS and Plink association tests: Users can now find CWL workflows in the Seven Bridges Public Apps Gallery for the association test methods EPACTS and Plink. More information can be found in this blog post.
Query annotations for TOPMed Freeze 8 variants in the Annotation Explorer: Users on Seven Bridges can now use the Annotation Explorer to interactively aggregate and filter ~1 billion variants from TOPMed Freeze 8 using 450 annotations. Variant grouping files can be created from the results and exported to a workspace for use in rare variant association testing. Users with dbGaP approval for one or more TOPMed studies are able to access and work with the full Freeze 8 variant annotation database.
Query open access variant annotations in Annotation Explorer: Users on Seven Bridges without dbGaP approval for any TOPMed studies can now make use of the Annotation Explorer and interactively query TOPMed variants from Freeze 5 that have been released in dbSNP, a public-domain archive for human variants. Users can aggregate and filter ~550 million variants using ~260 annotations available in this dataset and generate variant grouping files for rare variant association testing.
Query combined phenotypic and genomic data in PIC-SURE: A release of genomic data in PIC-SURE now allows users to perform combined phenotypic and genomic queries to see phenotypic/genomic correlations. Users can export queries/cohorts to Seven Bridges or Terra Workspaces using the PIC-SURE API.
Bring Your Own Tools to BioData Catalyst: This guide introduces users to the two Docker-based workflow languages used to run batch analyses in the ecosystem: the Workflow Description Language (WDL) in Terra and the Common Workflow Language (CWL) in Seven Bridges. The guide links to resources that lead users from the early steps of learning to wrap their current pipelines for use in the cloud to how to publish their work in our open access catalog Dockstore to share with the community. This guide was originally conceived in discussion with fellows during the BDCatalyst September Face-to-Face. Fellows developed content and provided feedback and are listed as contributors within the publication.
Benchmarking guide for GENESIS association test workflows: This guide provides users with comprehensive benchmarking information for the CWL versions of the GENESIS association workflows. This guide shows the computation costs and execution times for a variety of association tests using 2.5K samples, 10K samples, 36K samples, and 50K samples run on both AWS and Google Cloud. The benchmarking guide can be found on the page “GWAS with GENESIS” of the Seven Bridges documentation.
Bring Your Own Data to Terra Tutorial: We published a Jupyter notebook that provides functions for users to programmatically upload data to their Terra Google bucket and organize associated data into data tables for input into workflows. This may be a helpful resource for users that plan to upload many files. This notebook is part of a growing code library available in the BioData Catalyst Collection workspace.
Utilities Workflows on Dockstore: The BioData Catalyst Organization on Dockstore now has a Utilities collection with workflows for completing common tasks such as data import, genotype file processing, and quality control of whole genome or exome sequencing data. Fellow Kenny Westermann developed a workflow that fetches data from dbGaP for use in BDCatalyst.
The table below highlights the new data release on BioData Catalyst which includes both Parent and TOPMed studies. A total of 8 new Parent studies and their respective unharmonized clinical files were added to the ecosystem. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 2 TOPMed studies new to BioData Catalyst. Additionally, 6 TOPMed studies previously hosted on BioData Catalyst were updated to the latest study versions. These updates included new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8 (previously hosted Freeze 5b only). For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The data is now available for access across the entire ecosystem.
Gen3 release notes
PIC-SURE release notes
The 2021-10-04 release marks the seventh release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., project cost reporting on Terra and archiving files on AWS) along with documentation and tutorials (e.g., estimating and managing cloud costs) to help new users get started on the system. This release also includes enhanced support for semantic search and R Shiny apps. Please find more detail on the new features and user support materials in the sections below.
The 2021-10-04 data release includes the addition of the final BioLINCC training dataset plus another BioLINCC study, BabyHug. The TOPMed Combined Exchange Area buckets were updated with more datasets from multiple new freezes. The last dataset ingested was PCGC’s CMG. Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.
Updated Semantic Search UI: Dug, the BioData Catalyst's Semantic Search, has an updated user interface. The new interface makes it easy to see more results on one page. A zoom feature lets users expand individual results to explore in greater detail. Provenance in knowledge graphs and links to published literature are presented where available.
Archive files on AWS: Users on BioData Catalyst Powered by Seven Bridges can now select files to move from AWS S3 storage to AWS Glacier (archival storage). Moving files to archival storage can result in an ~80% cost reduction. It’s recommended that users move files to archival storage if the files will not be used for three or more months.
Project Per Work Space Cost Reporting on Terra: Users on BioData Catalyst Powered by Terra will now have more transparency and access to cost information with the rollout of PPWS. This update associates each Terra workspace with its own Google Project, created by Terra on behalf of users when workspaces are created. Switching to this “project-per-workspace” model enables added functionality for displaying a breakdown of costs per workspace in the Terra user interface, and allows Terra users to set up and use GCP budget alerts to be notified of cloud spending. This change will only apply to new workspaces created, with plans to migrate existing workspaces over to this model in the future.
Try out R Shiny apps in Terra: Since the rollout of Rstudio and Bioconductor last quarter, Terra’s Interactive Analysis team has expanded the capabilities of the cloud environments framework that supports running RStudio, Jupyter Notebook and Galaxy in Terra. Most recently, Terra users now have the ability to launch R Shiny apps from Terra’s built-in RStudio environment. Check out an example of an open-source R Shiny app developed by the Manning Lab to visualize whole-genome association data.
Save data from an IA environment: With the new R Shiny apps in Terra, users can save data from an IA environment. Saving data from an interactive cloud environment (such as an instance of RStudio or a Jupyter notebook) is a useful trick in some situations. Users worried about losing work done in an interactive environment because they need to delete or modify the persistent disk can use "gsutil" to copy it to the workspace bucket.
Speed up machine learning work with GPUs on Terra: Terra’s Interactive Analysis team has released an upgrade that enables adding Graphical Processor Units (GPUs) to Notebook cloud environments in Terra. Terra already offered the ability to use GPUs in workflows, and are now responding to user requests to run GPU-enabled computations interactively with GPU support for Jupyter Notebooks.
Speed up workflows and save costs using N2 instances sporting Intel’s 2nd Generation Xeon CPUs on Terra: Terra users will now have the option to use new-generation N2 instances, which have demonstrated faster performance and reduced cost. Read more about these updates and how to request N2 instances for workflows here.
Cross-study harmonization example notebook: This tutorial notebook will demonstrate how to query and work with the BioData Catalyst studies, particularly cross-study harmonization using the PIC-SURE API.
Estimate and Manage Cloud Costs on Seven Bridges: This tutorial describes how to estimate costs associated with using Seven Bridges. The tutorial includes an overview of both cloud storage costs and cloud computation costs and the primary drivers of those costs. The tutorial also provides guidance on how to approach estimating cloud storage and computation costs so that researchers can budget for cloud costs in their grants, request cloud credits, and plan their work on BioData Catalyst.
Public project for TOPMed Freeze8 variant calling pipelines: Users on Seven Bridges can now access a public project that walks through how to use the CWL tools and workflows that were used to perform variant calling of TOPMed Freeze8. The public project provides explanations of the purpose of all of the tools and workflows and how they are used together, along with examples of completed analyses. All of the CWL tools and workflows in the project are available in the Public Apps Gallery.
Need an easy way to explain Terra to your colleagues or collaborators? Try this quick (2-min.) overview of Terra.
Estimate Workflow Costs on Terra: Terra users can also follow this documentation to estimate costs of workflows. This is the original document describing the steps summarized in this blog post.
Understanding and controlling cloud costs on Terra: This article includes a detailed breakdown of the types of costs that you may incur when working on Google Cloud, plus some advice on how to reduce costs.
Understanding costs and billing on Terra: This article includes an overview of how billing works, including how billing accounts, projects and workspaces relate to each other, and the difference between workspace permissions and billing permissions.
Controlling cloud costs on Terra – sample use cases: This article includes a selection of typical analysis use cases, for which the costs are broken down in several scenarios in order to illustrate the effect of cost control strategies.
New tools and workflows released to Dockstore’s NHLBI BioData Catalyst Organization:
Three additional WDL workflows have been released in the UWGAC Ancestry, Relatedness, and Association Testing Collection, including KING, PC-Relate, and PC-AIR.
xvcfView WDL was released to the Utilities collection. This workflow provides the full power of bcftools view to subset, subsample, and filter VCF files.
New PrediXcan collection with CWL workflows can predict gene expression (or whatever biology the models predict) in a cohort with available genotypes and run associations to a trait measured in the cohort.
Launch Galaxy workflows from Dockstore into multiple Galaxy instances, including Terra:
New to Galaxy? The Galaxy Training Network is continuing to add training material in their Organization on Dockstore.
Additionally, users can explore some of the Galaxy community’s best practices workflows in their IWC Organization on Dockstore.
Ready to publish and share the tool or workflow you developed with the research community? Dockstore users can link their accounts to their ORCID and Zenodo accounts, mint DOIs for their workflows hosted on Dockstore, and now can export their workflows directly to their ORCID profile.
New video tutorials demonstrate exporting data from PIC-SURE to Terra and Seven Bridges using BioLINCC/Sickle Cell related data.
The table below highlights which studies were included in the 2021-10-04 data release. The final BioLINCC training dataset was uploaded, plus another BioLINCC study, BabyHug. The ORCHID dataset was re-ingested after the data owners found they had provided incorrect versions of the files at the time of initial ingestion. The TOPMed Combined Exchange Area buckets were updated with more datasets from multiple new freezes. The last dataset ingested was PCGC’s CMG. The data is now available for access across the entire ecosystem.
Gen3 release notes
PIC-SURE release notes
The 2021-01-15 release marks the fourth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., CWL workflows to create dataset specific files needed for GWAS) along with documentation and tutorials to help new users get started on the system. This release also includes enhanced support for CWL tools for post-GWAS analysis and a CWL tool for Bcftools Merge and Filter. Please find more detail on the new features and user support materials in the sections below.
The 2021-01-15 data release includes the addition of both TOPMed studies and the ORCHID Study, conducted by the (PETAL) Clinical Trials Network of NHLBI. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 27 TOPMed studies new to BioData Catalyst. Additionally, 7 TOPMed studies previously hosted on BioData Catalyst were updated to the latest study versions. These updates include new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The associated clinical files were added for the ORCHID study.
Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.
CWL workflows to create dataset specific files needed for GWAS: Users can now find the following CWL workflows for creating dataset specific files needed for GWAS in the Seven Bridges Public Apps Gallery:
LD Pruning - Filter variants based on linkage disequilibrium measures
KING robust and KING IBDseg - Estimate kinship coefficients
PC-AiR - Perform principal components analysis
PC-Relate - Estimate genetic relatedness
CWL tools for post-GWAS analysis: Users can now find the following CWL tools for post-GWAS analysis in the Seven Bridges Public Apps Gallery:
SBG Loci Snapshoter - Generate screenshots of specific regions of aligned files provided as inputs
LocusZoom - Standalone tool for generating static locus zoom plots. Users can make annotated Manhattan plots on specific regions from association files generated with the GENESIS association workflows.
CWL tool for Bcftools Merge and Filter: Users can now find a CWL tool for BCFtools Merge and Filter in the Seven Bridges Public Apps Gallery. This tool merges multiple VCF/BCF files from non-overlapping sample sets to create one multi-sample file and filter out any monomorphic variants. This tool is useful when working with input files that contain monomorphic variants like the TOPMed datasets.
Genetic Association Testing Using the GENESIS Workflows tutorial: Seven Bridges updated this tutorial to show how to perform an association test using the GENESIS workflows using TOPMed Freeze 8 multi-sample VCF data. Previous versions of this tutorial used TOPMed Freeze 5 data. Version 1.1 of this tutorial can be downloaded as a PDF from the Tutorials page of the BioData Catalyst GitBook.
ORCHID Clinical Trial Statistical Analysis Reproduction: NHLBI BioData Catalyst has made data available to authorized investigators for the study titled: PETAL Network: Outcomes Related to COVID-19 Treated With Hydroxychloroquine Among Inpatients With Symptomatic Disease (ORCHID) Trial, phs002299.v1.p1. This is based on the multi-center, double blinded, randomized clinical trial conducted to assess the efficacy of hydroxychloroquine in the treatment of COVID-19. Results were published in JAMA on November 9th, 2020 (paper available here). This notebook enables anybody with authorized credentials to reproduce the ORCHID clinical trial results by showing how to 1) Access the data using the PIC-SURE API and 2) Reproduce the results of this study using the open-source R programming language. Available in Seven Bridges Public Project, under PIC-SURE API or through PIC-SURE GitHub.
The table below highlights which studies were included in the 2021-01-15 data release which includes both TOPMed studies and The Outcomes Related to COVID-19 treated with hydroxychloroquine among In-patients with symptomatic Disease study, or ORCHID Study, conducted by the (PETAL) Clinical Trials Network of NHLBI. Multi-sample VCFs, CRAMs and unharmonized clinical files were added for 27 TOPMed studies new to BioData Catalyst. Additionally, 7 TOPMed studies previously hosted on BioData Catalyst were updated to the latest study versions. These updates included new CRAMs, unharmonized clinical files and multi-sample VCFs for Freeze 8. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. The associated clinical files were added for the ORCHID study. The data is now available for access across the entire ecosystem.
Gen3 release notes
PIC-SURE release notes
The 2020-08-24 release marks the second release for the NHLBI BioData Catalyst ecosystem. This release includes several new features along with documentation and tutorials (e.g. genome-wide association studies) to help new users get started on the system. This release also includes enhanced support for machine learning in the workspace environments and support for GA4GH industry standard in Dockstore for workflows. Please find more detail on the new features and user support materials in the sections below.
The 2020-08-24 data release includes the addition of TOPMed Freeze 8 data for a subset of studies on BioData Catalyst. Freeze8 multi-sample VCFs are available for 29 studies, of which 10 studies are new to the ecosystem. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format, in contrast to the Freeze5 multi-sample VCFs which are hosted as tar bundles. For the 10 studies new to BioData Catalyst, CRAM files and unharmonized clinical files are also available for access. The data release further includes updates of many studies to the latest versions that are available on dbGaP. The next data release will include Freeze8 multi-sample VCFs for additional TOPMed studies in addition to unharmonized clinical data and CRAM files for studies that are not yet hosted on the system. Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.
GENESIS tutorial and public project: Seven Bridges has made a public project available that introduces users to the GENESIS R package and related R packages (SeqArray, SeqVarTools, and SNPRelate) used in mixed model association testing in sequence data. The examples in the project help users understand the code that is used in the GENESIS public apps (available on GitHub), prepare data for input to those apps, and interact with the results. The “GENESIS Tutorial” public project can be found in the list of Seven Bridges public projects on the top navigation bar of the platform.
Launch machine learning packages in Jupyterlab Notebooks: Users on Seven Bridges can now use a docker image with pre-installed libraries that support machine learning analyses when working in Jupyterlab Notebooks. This docker image can be found in the Data Cruncher feature: Select “Create new analysis” and then, under the Environment setup menu, select “SB Machine Learning - TensorFlow 2.0, Python 3.7.”
Support for larger GPU instances: A larger AWS GPU instance type is now available for researchers working in Jupyterlab Notebooks and RStudio on Seven Bridges. The p3dn.24xlarge instance has 1800GB SSD, 96vCPUs, 768GB RAM, and 8 GPUs. These higher memory cards enable machine learning training on large 3D images and high-performance computing applications.
Data Cruncher Interactive Analyses: Seven Bridges now features a “Data Cruncher Interactive Analyses” public project, found in the list of public projects on the top navigation bar of the platform, with example analyses to help users interpret results from secondary analysis. The project has eight separate analyses - three in RStudio and five in Jupyterlab Notebooks - including one on VCF visualization and one on structural variant analysis. Read the blog post.
Launch Dockstore workflows in Seven Bridges: Users can now find CWL workflows in Dockstore and launch them in the Seven Bridges workspace environment.
Export large PFBs from Gen3 to Terra: Users can now export large PFB (Portable Format for Bioinformatics) files from Gen3 (e.g. synthetic cohort files from multiple groups) to Terra. New backend systems now automatically parse files more efficiently.
Automatic syncing with GitHub apps: Dockstore now automatically updates your workflows with any changes you make to your linked GitHub repository.
Link ORCID iDs to published workflows: Users can now link their ORCID iDs to their Dockstore accounts, and make iDs visible via their organizations, and in workflows and tools they have starred. Users searching Dockstore’s catalog will be able to associate workflows you contribute with scientific publications.
GA4GH TRS Support: Dockstore now implements the GA4GH Tool Registry Service (TRS) v2 standard. The goal of the TRS API is to provide a standardized way to describe the availability of tools and workflows.
Transfer datasets to Jupyter Notebooks with Query Id: Users that query the PIC-SURE UI and apply filters to create datasets can submit their query ID to the PIC-SURE client library via an R or Python Jupyter Notebook and do not need to re-build the query manually.
Data Tree Optimizations: The PIC-SURE data tree has been optimized to show users only the studies they have been authorized to see and rendered more efficiently to allow users to select studies faster.
Export data dictionary of clinical variables: R and Python Jupyter Notebooks are now available that provide directions on exporting the full data dictionary of all clinical variables to a CSV via PIC-SURE.
Overview of the ecosystem: This collaboratively developed overview document guides new users through the process of understanding what the BioData Catalyst is to getting started using the ecosystem.
Tips for reliable and efficient analysis set-up: This guide provides recommendations on how to set up your initial set of analyses, tips for running tools/workflows, and specifications for computational resources on Seven Bridges.
Genetic Association Testing Using GENESIS Workflows: This tutorial guides users through the steps of running a single variant or multiple variant association test on Seven Bridges using the GENESIS R package pipelines.
Troubleshooting Tasks: This guide presents some of the most common errors in task execution on Seven Bridges and shows you how to debug and resolve them.
GWAS tutorial and example cloud costs: Terra’s GWAS tutorial walks users through the steps of preparing data for input using Hail in Jupyter notebooks and running association tests as workflows with the GENESIS R package and provides example cloud costs derived from the tutorial.
Code Library: This release includes a Terra featured workspace containing R and Python Jupyter Notebooks that cover how to use the Integrated Genomics Viewer with data from Gen3, workflows for merging VCF files, and expanded features for interacting with data using the Data Repository Service (DRS) such as bulk downloads.
Dockstore Fundamentals: A video recording, slides, and exercises are available from the recent workshop Dockstore Fundamentals: Introduction to Docker and Descriptors for Reproducible Analysis.
API and User Interface Technical Documentation: PIC-SURE technical documentation provides users with information about the PIC-SURE API and user interface and examples of how to load data into the PIC-SURE High Performance Data Store.
Scalability and cost-effectiveness analysis of whole genome-wide association studies in the Cloud: This recent article from PIC-SURE provides a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies and helps users understand the trade-off between speed and cost.
The table below highlights which TOPMed studies were included in the 2020-08-24 data release. Freeze 8 multi-sample VCFs were added for the 29 studies listed in the table below. This includes 19 studies which were previously hosted on BioData Catalyst with Freeze 5b data available and 10 studies which are new to BioData Catalyst. For each study and consent group, VCF files are available on a per chromosome basis and in an un-tarred format. For the 10 studies which are new to BioData Catalyst, CRAM files and unharmonized clinical files are also available for access. Additionally, 10 of these studies were updated to the latest version. The data is now available for access across the entire ecosystem.
Gen3 release notes
PIC-SURE release notes
The 2021-07-09 release marks the sixth release for the NHLBI BioData Catalyst ecosystem. This release includes several new features (e.g., SAS on Seven Bridges and Galaxy’s integration with Terra) along with documentation and tutorials to help new users get started on the system (e.g., PIC-SURE Open Access). This release also includes enhanced support for maintaining and versioning CWL on external tool repositories. Please find more details on the new features and user support materials in the sections below.
The 2021-07-09 data release includes the addition of CRAMs and unharmonized clinical files for the parent project CARDIA and 3 other TOPMed programs. The TOPMed program REDS-III received a version update. The unharmonized clinical files were uploaded for the 5 BioLINCC projects and the 2 open tutorial projects. Please refer to the Data Release section below for more information as well as the Data page on the BioData Catalyst website.
Authentication through the NIH Researcher Authentication Service: The BioData Catalyst ecosystem updated the authentication mechanism to use the NIH Researcher Authentication Service. Researchers will now be redirected to the NIH RAS page to enter their eRA Commons credentials when logging into one of the platforms within the ecosystem.
New Jupyter notebook Sample and Variant quality control methods with Hail published in the BioData Catalyst Collection Featured Workspace on Terra.
Galaxy has integrated with Terra: Galaxy is now available through the other “faces” of Terra, including the NHLBI BioData Catalyst. You can launch your very own Galaxy server without having to do any configuration yourself, right from the Terra web interface. This marks a transition from alpha to beta development status of Galaxy on Terra, meaning that the software is more mature and considered reliable enough for regular work, with the caveat that minor changes may occur over time as we smooth out any remaining rough edges and improve user experience in the application. Learn more about how to use Galaxy in Terra here. Features of Galaxy and its use within Terra are also featured in our blog post here. You can also import Dockstore workflows into Galaxy when it's launched in Terra. Speaking of workflows, Cromwell 64 is now live on Terra.
Seurat package now included by default in R-based cloud environments: The RStudio image now has Seurat, a tool for single-cell transcriptomics, as well as crcmod, a package for verifying the integrity of an object in Google Cloud Storage.
Interactive Analysis: Jupyter Notebook images have been updated with Bioconductor 3.13.0. See the Bioconductor release notes here.
SAS: Users on Seven Bridges can now launch SAS for interactive analysis from the Data Cruncher feature. All project files are available within SAS. Users can select from three SAS offerings built on top of SAS Studio: 1) SAS Business Intelligence enables users to utilize SAS code to manage data, create, modify and compare descriptive and predictive models. Capabilities include clustering, decision trees, linear and logistic regression. 2) SAS Analytics adds the power of SAS Viya’s Data Mining and Machine Learning algorithms such as neural networks, gradient boosting, and random forest. 3) SAS Data Science provides access to text analysis, time series models, advanced forecasting and model governance.
LocusZoom Interactive Application: Users on Seven Bridges can now launch an R Shiny application that enables users to select, visualize and interactively explore single variant association test results data, with no prior R programming knowledge. Researchers can explore existing analyses available in the University of Michigan database, generate LocusZoom plots for example data, or provide their own association .RData files. The app also provides the JSONizer tool, which enables researchers to subset their association test results (.RData) files and to convert them into the appropriate JSON files required by LocusZoom. Users can launch the application from the Public project “LocusZoom Shiny App” in the top navigation bar.
Example notebook for data import with DRS: These example notebooks on Seven Bridges provide users with the code and steps for importing data from CAVATICA (Kids First data) as well as importing GTEx data from the NHGRI AnVIL system. The import utilizes the DRS functionality to access files that are stored on other NIH cloud systems. Users can find notebooks from the Public Project “Data Interoperability” in the top navigation bar.
CWL v1.2 available: BioData Catalyst Powered by Seven Bridges now supports Common Workflow Language (CWL) version 1.2. The new version of CWL brings a major new functionality - conditional execution of workflow steps, as well as several minor features and improvements. For the detailed change log please see the CWL CommandLineTool specification and the CWL Workflow specification.
New CWL tools and workflows on BioData Catalyst Powered by Seven Bridges: Users can find all these tools and more in the Public Apps Gallery:
Regenie 2.0.1 - This is a tool for whole genome regression analysis.
GENESIS Association results plotting - This UW-GAC tool is a standalone app for creating Manhattan and QQ plots from the GENESIS association test results with additional filtering and stratification options available.
WGSA 0.9 - This is a scalable SNV and INDEL annotation pipeline, performing a spectrum of annotations in a single tool. It integrates annotations from dozens of databases and annotation tools.
GENESIS Update Null Model for Fast Score Test - This updates the null model file obtained with the GENESIS Null model workflow so that it can be used in the GENESIS Single Variant Association Testing workflow in fast score mode.
Missing rate by sample - This UW-GAC tool was created for QC in GWAS. The tool calculates missing rate by sample. A subset of variants may be specified.
Missing rate by variant - This UW-GAC tool was created for QC in GAWS. The tool calculates missing rate by variant. A subset of samples and/or variants may be specified.
Allele frequency - This UW-GAC tool was created for QC in GAWS. The tool calculates allele frequency and counts. Values for both the alternate allele (count, frequency) and the minor allele (MAC, MAF) are returned. A subset of samples and/or variants may be specified.
Id-index - This UW-GAC tool calculates the LD among an index variant and each variant in a set of other variants stored in a GDS file using the snpgdsLDMat function in the SNPRelate R package and a wrapper LDcompute R package.
Id-pair - This UW-GAC tool calculates the LD between a pair of variants stored in a GDS file using the snpgdsLDMat function in the SNPRelate R package and a wrapper LDcompute R package.
Id-set - This UW-GAC tool calculates the LD between all pairs in a user-specified set of variants stored in a GDS file using the snpgdsLDMat function in the SNPRelate R package and a wrapper LDcompute R package.
PIC-SURE Data Access Dashboard Updates: PIC-SURE’s Data Access Dashboard has been updated to include the number of studies and participants the user has access to based on their authorization.
New PIC-SURE Open Access: PIC-SURE Open Access is now available in BioData Catalyst! PIC-SURE Open Access is available to users who have an eRA Commons account, including those who are not authorized to access any studies. The Open Access feature allows users to explore de-stigmatized, phenotypic data available in PIC-SURE prior to requesting access to data. For more information check out the user guide and tutorial.
New PIC-SURE Jupyter notebook examples are available as public projects in Seven Bridges and Terra as follows:
Example showing the users how to access lipid measurements across harmonized variables and multiple visits using the PIC-SURE API in R, RStudio, and Python.
All previous notebooks examples are now available in RStudio on Seven Bridges.
New UWGAC Ancestry and Relatedness analysis collection on Dockstore under the BioData Catalyst organization: This collection includes two WDL workflows to help users prepare their data for association testing: one for converting VCF files to GDS and one for linkage disequilibrium pruning. Stay tuned as more workflows are released.
New Large-scale Gene by Environment collection on Dockstore under the BioData Catalyst organization: The WDL workflows in this collection enable scalable, efficient, and flexible genome-wide gene-environment interaction analysis. GEM conducts single-variant analysis for common variants (currently in unrelated individuals only) and MAGEE conducts single-variant and variant set-based analysis for common or rare variants while allowing for relatedness. The collection also includes examples of cloud costs in the README.
BioLINCC Phase 2 data dictionaries: These data dictionaries were submitted in PDF format which required additional intervention and delayed general release to the platform. These data dictionaries will be released as soon as is feasible for use across the platform.
Maintaining and Versioning CWL on External Tool Repositories: This tutorial presents best practices for writing and maintaining CWL tools/workflows in an external tool repository, such as GitHub, so that users can better manage versions of their tools. Users should follow these best practices if they would like to publish and share their CWL tools and workflows in the Dockstore repository since Dockstore has the ability to automatically pull changes from GitHub. These best practices will ensure that the CWL is fully portable and can run successfully not only on Seven Bridges Platforms, but also on other CWL executors such as cwltool and Toil.
Transferring Files Between Seven Bridges and Terra: This tutorial guides users through the process of transferring files between the two workspace environments Seven Bridges and Terra.
Accessing Egress-Free GTEx Data From AnVIL: A new data interoperability page that includes linked instructions for how to access egress-free GTEx data from NHGRI’s AnVIL cloud ecosystem is here.
PIC-SURE Documentation Updates: New PIC-SURE documentation provides new information on the Data Access Dashboard, PIC-SURE Open Access, and a new table for understanding study-specific subject identifiers.
PIC-SURE Video Tutorials: PIC-SURE Video tutorials are now available for the following topics:
Introduction to PIC-SURE
Introduction to PIC-SURE Open Access: Harmonized
Introduction to PIC-SURE Open Access: One Criterion Search
Introduction to PIC-SURE Open Access
Introduction to PIC-SURE Open Access: Multiple search criteria
Introduction to PIC-SURE Authorized Access
Introduction to PIC-SURE Authorized Access: Data Export
Published a blog post on the role of a secure cloud ecosystem for supporting infrastructure projects and creating connected communities, highlighting BioData Catalyst as one of several NIH-commissioned infrastructure development projects that involve not just putting data on the cloud but also building the additional layers of services that are necessary to deliver on the extraordinary promise of this new model for data sharing and analysis.
The table below highligts which studies were included in the 2021-07-09 data release. CRAMs and unharmonized clinical files were uploaded for the parent project CARDIA and 3 other TOPMed programs. The TOPMed program REDS-III received a version update. The unharmonized clinical files were uploaded for the 5 BioLINCC projects and the 2 open tutorial projects. The data is now available for access across the entire ecosystem.
Gen3 release notes
PIC-SURE release notes
The 2020-04-02 release marks the first significant release for the NHLBI BioData Catalyst ecosystem. This release offers an integrated system of for researchers to search metadata of hosted datasets, find data files, and analyze data files in workspace environments which support a variety of different analysis modalities.
The hosted data for this release includes TOPMed multi-sample VCF data for ~55,000 sequenced participants within 32 TOPMed studies included in for participants in TOPMed studies, providing clinical information such as BMI and lipids levels. In some cases, these data are in different dbGaP accessions than the genomic data. The hosted data is stored in both Amazon Web Services and Google Cloud and users have the option to run computation on either cloud provider. To access the hosted TOPMed data on BioData Catalyst, users must have dbGaP approval. Please refer to the on the BioData Catalyst website for more information.
For more in depth information please see the "List of significant new features" below.
The following features in this release support primarily TOPMed researchers ranging in technical skills (both command-line and GUI) and with approval for the controlled TOPMed studies in dbGaP:
System login and data access: Researchers can log into the BioData Catalyst platforms using their eRA Commons ID. Approvals for TOPMed studies in dbGaP are recognized by the platforms.
Search TOPMed phenotypic data: Create cohorts on PIC-SURE by searching and selecting phenotypic variables of interest from dbGaP and then export cohorts to Seven Bridges or Terra for use in analysis workspaces. Users can also explore the TOPMed phenotype variables harmonized by the TOPMed Data Coordinating Center.
Find and access TOPMed genomics files, raw phenotype data files, and reference data files: Use the Explorer feature on Gen3 and the Data Browser feature on Seven Bridges.
Bring your own data: Use one of several options to upload/import data files to the workspace environments.
Run analyses at Scale: Analyze thousands of samples at once using batch processing capabilities in secure workspaces. Ability to run computation on Google Cloud and Amazon Web Services. Utilize visual user interface, Jupyterlab Notebooks and Jupyter Notebooks, RStudio, API, and command line.
Association studies: Execute single variant and multiple variant association studies utilizing the GENESIS pipelines, Hail, and others. Utilize Annotation Explorer to create variant grouping files for multiple variant association studies.
Collaborate with other users: Share workspaces, files, and tools with other BioData Catalyst users.
Documentation: Access documentation for each of the platforms.
Track cloud costs: Track cloud storage and compute costs on Seven Bridges and Terra.
Data Releases
Information on the status of data releases is forthcoming.
Gen3 release notes
PIC-SURE release notes
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
Framingham Cohort
phs000007
FHS
False
30
Genetic Epidemiology Network of Salt Sensitivity (GenSalt)
phs000784
GenSalt
False
3
Atherosclerosis Risk in Communities (ARIC) Cohort
phs000280
ARIC
False
7
Genes-Environments and Admixture in Latino Asthmatics (GALA II) Study
phs001180
GALAII
False
2
Cardiovascular Health Study (CHS) Cohort
phs000287
CHS
False
7
Women's Health Initiative Clinical Trial and Observational Study
phs000200
WHI
False
12
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
NHLBI GO-ESP: Lung Cohorts Exome Sequencing Project (Asthma)
phs000422
Asthma
Yes
CATHeterization GENetics (CATHGEN)
phs000703
CATHGEN
Yes
NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene)
phs000951
COPDGene
Yes
The Diabetes Heart Study (DHS)
phs001012
DHS
Yes
Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)
phs001252
ECLIPSE
Yes
NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE)
phs001472
ECLIPSE
Yes
NHLBI TOPMed: Boston Early-Onset COPD Study in the TOPMed Program
phs000946
EOCOPD
Yes
NHLBI TOPMed - NHGRI CCDG: Genes-Environments and Admixture in Latino Asthmatics (GALA II)
phs000920
GALAII
Yes
NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy (GENOA)
phs001345
GENOA
Yes
NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity (GenSalt)
phs001217
GenSalt
Yes
Hispanic Community Health Study /Study of Latinos (HCHS/SOL)
phs000810
HCHS-SOL
Yes
Pediatric Cardiac Genomics Consortium (PCGC) Study
phs001194
PCGC
Yes
NHLBI TOPMed: PCGC's Congenital Heart Disease Biobank
phs001735
PCGC_CHD
Yes
PGRN-RIKEN: Rate Control Therapy in Patients with Atrial Fibrillation
phs000439
PGRN-RIKEN_AF
Yes
NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment (SAGE)
phs000921
SAGE
Yes
SNP Health Association Resource (SHARe) Asthma Resource Project (SHARP)
phs000166
SHARP
Yes
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
BioLINCC (Phase 1) - Training Data (Digitalis)
open
true
NA
Additional TOPMed combined EA
c999
Freeze1/
Freeze9b/
Freeze10a
true
NA
PETAL - ORCHID (data re-ingested since files initially provided by data submitters were not the final version )
phs002299
ORCHID
false
1
PCGC (CMG/Wagner)
CMG
true
1
CureSCi - BabyHug (via BioLINCC)
phs002415
BabyHug
true
1
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
TOPMed Freeze 9 - Batch 1
(22 datasets included)
Various
Various
false
NA
PCGC SRA Data
Additional TOPMed Freeze 8 Studies (CATHGen)
phs000571
true
6
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans
phs000972
SAS
NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados
phs001143
BAGS
Yes
NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese (THRV)
phs001387
THRV
Yes
NHBLI TOPMed: Pharmacogenomics of Hydroxyurea in Sickle Cell Disease (PharmHU)
phs001466
pharmHU
Yes
NHLBI TOPMed: Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-Ethnicity (SAPPHIRE)
phs001467
SAPPHIRE_asthma
Yes
NHLBI TOPMed: MyLifeOurFuture (MLOF) Hemophilia Study
phs001515
MLOF
Yes
NHLBI TOPMed: Diabetes Heart Study (DHS) African American Coronary Artery Calcification (AA CAC)
phs001412
AACAC
Yes
NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women
phs001040
WGHS
Yes
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry (VU_AF)
phs001032
VU_AF
NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica
phs000988
CRA
Yes
NHLBI TOPMed - NHGRI CCDG: MGH Atrial Fibrillation Study
phs001062
MGH_AF
Yes
NHLBI TOPMed: Australian Familial Atrial Fibrillation Study
phs001435
AustralianFamilialAF
Yes
NHLBI TOPMed: African American Sarcoidosis Genetics Resource
phs001207
Sarcoidosis
Yes
NHLBI TOPMed: CHS Gene-Air Pollution Interactions in Asthma (GAP)
phs001602
ChildrensHS_GAP
Yes
NHLBI TOPMed: CHS (Effects of Air Pollution on the Development of Obesity in Children)
phs001604
ChildrensHS_MetaAir
Yes
NHLBI TOPMed - NHGRI CCDG: AFLMU
phs001543
AFLMU
Yes
NHLBI TOPMed - NHGRI CCDG: Malmo Preventive Project (MPP)
phs001544
MPP
Yes
NHLBI TOPMed - NHGRI CCDG: Intermountain INSPIRE Registry
phs001545
INSPIRE_AF
Yes
NHLBI TOPMed: Texas Cardiac Arrhythmia Institute - DECAF Study
phs001546
DECAF
Yes
NHLBI TOPMed: Early-onset Atrial Fibrillation in the Estonian Biobank
phs001606
EGCUT
Yes
NHLBI TOPMed: CHS Integrative Genomics and Environmental Research of Asthma (IGERA)
phs001603
ChildrensHS_IGERA
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607
IPF
Yes
NHLBI TOPMed - NHGRI CCDG: The GENetics in Atrial Fibrillation (GENAF) Study
phs001547
GENAF
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607
IPF
Yes
NHLBI TOPMed: Chicago Initiative to Raise Asthma Health Equity (CHIRAH)
phs001605
CHIRAH
Yes
NHLBI TOPMed: Pulmonary Fibrosis Whole Genome Sequencing
phs001607
IPF
Yes
NHLBI TOPMed: Outcome Modifying Genes in Sickle Cell Disease (OMG)
phs001608
OMG_SCD
Yes
NHLBI TOPMed - NHGRI CCDG: Vanderbilt University BioVU Atrial Fibrillation Genetics Study
phs001624
BioVU_AF
Yes
NHLBI TOPMed: Lung Tissue Research Consortium (LTRC)
phs001662
LTRC
Yes
NHLBI TOPMed CCDG: Groningen Atrial Fibrillation (GGAF) Study
phs001725
GGAF
Yes
NHLBI TOPMed: Pathways to Immunologically Mediated Asthma (PIMA)
phs001727
PIMA
Yes
NHLBI TOPMed: Best ADd-on Therapy Giving Effective Response (BADGER)
phs001728
CARE_BADGER
Yes
NHLBI TOPMed: Characterizing the Response to a Leukotriene Receptor Antagonist and an Inhaled Corticosteroid (CLIC)
phs001729
CARE_CLIC
Yes
NHLBI TOPMed: Pediatric Asthma Controller Trial (PACT)
phs001730
CARE_PACT
Yes
NHLBI TOPMed: TReating Children to Prevent EXacerbations of Asthma (TREXA)
phs001732
CARE_TREXA
Yes
PETAL Network: Outcomes Related to COVID-19 Treated With Hydroxychloroquine Among Inpatients With Symptomatic Disease (ORCHID) Trial
phs002299
ORCHID
Yes
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish
phs000956
Amish
-
Yes
NHLBI TOPMed: Atherosclerosis Risk in Communities
phs001211
ARIC
-
-
NHLBI TOPMed: NHGRI CCDG: The BioMe Biobank at Mount Sinai
phs001644
BioMe
Yes
-
NHLBI TOPMed: Childhood Asthma Management Program
phs001726
CAMP
Yes
-
NHLBI TOPMed: Coronary Artery Risk Development in Young Adults
phs001612
CARDIA
Yes
-
NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation
phs001189
CCAF
-
Yes
NHLBI TOPMed: The Cleveland Family Study
phs000954
CFS
-
Yes
NHLBI TOPMed: Cardiovascular Health Study
phs001368
CHS
-
-
NHLBI TOPMed: Framingham Heart Study
phs000974
FHS
-
-
NHLBI TOPMed: Genetic Study of Atherosclerosis Risk
phs001218
GeneSTAR
-
Yes
NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate
phs001359
GOLDN
-
Yes
NHLBI TOPMed: The Hispanic Community Health Study/Study of Latinos
phs001395
HCHS/SOL
Yes
-
NHLBI TOPMed: The Heart and Vascular Health Study
phs000993
HVH
-
-
NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy
phs001293
HyperGEN
-
Yes
NHLBI TOPMed: The Jackson Heart Study
phs000964
JHS
-
-
NHLBI TOPMed: NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study
phs001598
JHU_AF
Yes
-
NHLBI TOPMed: The Multi-Ethnic Study of Atherosclerosis
phs001416
MESA
-
-
NHLBI TOPMed: Plasma microRNAs are associated with atrial fibrillation and change after catheter ablation
phs001434
miRhythm
Yes
-
NHLBI TOPMed: Partners HealthCare Biobank
phs001024
PARTNERS
-
Yes
NHLBI TOPMed: Pulmonary Hypertension and the Hypoxic Response in Sickle Cell Disease
phs001682
PUSH_SCD
Yes
-
NHLBI TOPMed: Recipient Epidemiology and Donor Evaluation Study-III Brazil Sickle Cell Disease Cohort
phs001468
REDS-III_Brazil_SCD
Yes
-
NHLBI TOPMed: San Antonio Family Heart Study
phs001215
SAFHS
-
Yes
NHLBI TOPMed: Severe Asthma Research Program
phs001446
SARP
Yes
-
NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans
phs000972
SAS
-
-
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry
phs000997
VAFAR
-
Yes
NHLBI TOPMed: Venous Thromboembolism project
phs001402
VTE
-
-
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry
phs001032
VU_AF
-
Yes
NHLBI TOPMed: Walk-PHaSST Sickle Cell Disease
phs001514
Walk_PHaSST_SCD
Yes
-
NHLBI TOPMed: Women's Health Initiative
phs001237
WHI
-
-
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
Treatment of Pulmonary Hypertension and Sickle Cell Disease with Sildenafil Therapy
phs002383
WalkPHaSST
true
1
CARDIA Cohort
phs000285
CARDIA
false
3
phs001601
CCDG-PMBB
true
1
phs002385
CIBMTR
true
1
phs002362
CSSCD
true
1
phs002348
MSH
true
1
phs002386
STOPII
true
1
phs001542
GALA
true
1
phs001661
GCPD-A
true
2
phs001468
REDS-III
false
2
Tutorial-biolincc_camp
open
true
tutorial-biolincc_framingham
open
true
Study Name
phs I.D. #
Acronym
New to BioData Catalyst
New study version
Combined Exchange Area new data
false
BioLINCC – Training Dataset – Digitalis
BioLINCC – BabyHug
phs002415
true
Hosted TOPMed study accessions with genomic data from Freeze 5b |
Study Name | Acronym | phs I.D. # |
NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish | Amish | phs000956 |
NHLBI TOPMed: Atherosclerosis Risk in Communities | ARIC | phs001211 |
NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados | BAGS | phs001143 |
NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation Study | CCAF | phs001189 |
NHLBI TOPMed: The Cleveland Family Study | CFS | phs000954 |
NHLBI TOPMed: Cardiovascular Health Study | CHS | phs001368 |
NHLBI TOPMed: Genetic Epidemiology of COPD | COPDGene | phs000951 |
NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica | CRA | phs000988 |
NHLBI TOPMed: Diabetes Heart Study | DHS | phs001412 |
NHLBI TOPMed: Boston Early-Onset COPD Study | EOCOPD | phs000946 |
NHLBI TOPMed: Framingham Heart Study | FHS | phs000974 |
NHLBI TOPMed: Genes-Environments and Admixture in Latino Asthmatics | GALAII | phs000920 |
NHLBI TOPMed: Genetic Study of Atherosclerosis Risk | GeneSTAR | phs001218 |
NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy | GENOA | phs001345 |
NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity | GenSalt | phs001217 |
NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate | GOLDN | phs001359 |
NHLBI TOPMed: Heart and Vascular Health Study | HVH | phs000993 |
NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy | HyperGEN | phs001293 |
NHLBI TOPMed: The Jackson Heart Study | JHS | phs000964 |
NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism | Mayo_VTE | phs001402 |
NHLBI TOPMed: The Multi-Ethnic Study of Atherosclerosis | MESA | phs001416 |
NHLBI TOPMed: Massachusetts General Hospital (MGH) Atrial Fibrillation Study | MGH_AF | phs001062 |
NHLBI TOPMed: Partners HealthCare Biobank | Partners | phs001024 |
NHLBI TOPMed: San Antonio Family Heart Study | SAFS | phs001215 |
NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment | SAGE | phs000921 |
NHLBI TOPMed: African American Sarcoidosis Genetics Resource | Sarcoidosis | phs001207 |
NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans | SAS | phs000972 |
NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese | THRV | phs001387 |
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry | VAFAR | phs000997 |
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry | VU_AF | phs001032 |
NHLBI TOPMed: The Women's Genome Health Study | WGHS | phs001040 |
NHLBI TOPMed: Women's Health Initiative | WHI | phs001237 |
Hosted TOPMed study accessions with phenotype data |
Study Name | Acronym | phs I.D. # |
Atherosclerosis Risk in Communities | ARIC | phs000280 |
Cleveland Clinic Atrial Fibrillation Study | CCAF | phs000820 |
The Cleveland Family Study | CFS | phs000284 |
Cardiovascular Health Study | CHS | phs000287 |
Genetic Epidemiology of COPD | COPDGene | phs000179 |
Framingham Heart Study | FHS | phs000007 |
Genes-Environments and Admixture in Latino Asthmatics | GALAII | phs001180 |
Genetic Study of Atherosclerosis Risk | GENESTAR | phs001074 |
Genetic Epidemiology Network of Arteriopathy | GENOA | phs001238 |
Genetic Epidemiology Network of Salt Sensitivity | GENSALT | phs000784 |
Heart and Vascular Health Study | HVH | phs001013 |
The Jackson Heart Study | JHS | phs000286 |
The Multi-Ethnic Study of Atherosclerosis | MESA | phs000209 |
Massachusetts General Hospital (MGH) Atrial Fibrillation Study | MGH_AF | phs001001 |
Women's Health Initiative | WHI | phs000200 |