Overview of the Portable Format for Bioinformatics (PFB) file type
What is a Portable Format for Bioinformatics?
A Portable Format for Bioinformatics (PFB) allows users to transfer both the metadata from the the Data Dictionary as well as the Data Dictionary itself. As a result, data can be transferred while keeping the structure from the original source. Specifically, a PFB consists of three parts:
A schema
Metadata
Data
For more information and an in-depth review that includes Python tools for PFB creation and exploration, refer to the PyPFB github page and install the newest version.
Note
The following PFB example is a direct PFB export from the tutorial-synthetic_data_set_1 found on BioData Catalyst Powered by Gen3. Due to the large amount of data stored within PFB files, only small sections are shown with breaks (displayed as ... ) occurring in the output.
Schema
A schema is a JSON formatted Data Dictionary containing information about the properties, such as value types, descriptions, and so on.
To view the PFB schema, use the following command:
NOTE: To make the outputs more human-readable, the above information was then piped through the program jq. Example: pfb show -i PFB_file.avro schema | jq
Metadata
The metadata in a PFB contains all of the information explaining the linkage between nodes and external references for each of the properties.
To view the PFB metadata, use the following command:
The data in the PFB are the values for the properties in the format of the Data Dictionary.
To view the data within the PFB, use the following command:
pfb show -i PFB_file.avro
To view at a certain number of entries in the PFB file, use the flag -n to designate a number. For example, to view the first 10 data entries within the PFB, use the following command: