PFB Files
Overview of the Portable Format for Bioinformatics (PFB) file type
What is a Portable Format for Bioinformatics?
A Portable Format for Bioinformatics (PFB) allows users to transfer both the metadata from the the Data Dictionary as well as the Data Dictionary itself. As a result, data can be transferred while keeping the structure from the original source. Specifically, a PFB consists of three parts:
A schema
Metadata
Data
For more information and an in-depth review that includes Python tools for PFB creation and exploration, refer to the PyPFB github page and install the newest version.
Schema
A schema is a JSON formatted Data Dictionary containing information about the properties, such as value types, descriptions, and so on.
To view the PFB schema, use the following command:
pfb show -i PFB_file.avro schemaExample Output
...
{
"type": "record",
"name": "gene_expression",
"fields": [
{
"default": null,
"name": "data_category",
"type": [
"null",
{
"type": "enum",
"name": "gene_expression_data_category",
"symbols": [
"Transcriptome Profiling"
]
}
]
},
{
"default": null,
"name": "data_type",
"type": [
"null",
{
"type": "enum",
"name": "gene_expression_data_type",
"symbols": [
"Gene Expression Quantification"
]
}
]
},
{
"default": null,
"name": "data_format",
"type": [
"null",
{
"type": "enum",
"name": "gene_expression_data_format",
"symbols": [
"TXT",
"TSV",
"CSV",
"GCT"
]
}
]
},
{
"default": null,
"name": "experimental_strategy",
"type": [
"null",
{
"type": "enum",
"name": "gene_expression_experimental_strategy",
"symbols": [
"RNA-Seq",
"Total RNA-Seq"
]
}
]
},
{
"default": null,
"name": "file_name",
"type": [
"null",
"string"
]
},
{
"default": null,
"name": "file_size",
"type": [
"null",
"long"
]
},
{
"default": null,
"name": "md5sum",
"type": [
"null",
"string"
]
},
{
"default": null,
"doc": "The GUID of the object in the index service.",
"name": "object_id",
"type": [
"null",
"string"
]
}
...NOTE: To make the outputs more human-readable, the above information was then piped through the program jq. Example:
pfb show -i PFB_file.avro schema | jq
Metadata
The metadata in a PFB contains all of the information explaining the linkage between nodes and external references for each of the properties.
To view the PFB metadata, use the following command:
Example Output
Data
The data in the PFB are the values for the properties in the format of the Data Dictionary.
To view the data within the PFB, use the following command:
To view at a certain number of entries in the PFB file, use the flag -n to designate a number. For example, to view the first 10 data entries within the PFB, use the following command:
Example Output
Last updated
Was this helpful?
