Each of the harmonization flow plugins in the MarkLogic Operational Data Hub Framework are intended to be customized. There are five plugins, collector.xqy, content.xqy, header.xqy, triples.xqy, and writer.xqy. The simplest harmonization follows something like this:
- Identify which documents in the staging database need to be processed in the collector plugin
- Transform the documents from step 1. in the content plugin (add the if/else logic)
- Write the harmonized documents from step 2. to the final database using the writer plugin.
Here are summaries of each of the plugins from the ODH Wiki:
Collector
Select IDs of documents in the staging database to be processed.
Content
Perform transformation of input data into a normalized or canonical format to store in the final document or documents. You can add custom transformation code here.
Header
A headers plugin is responsible for extracting header items from the content. You can add metadata or augment the content in the header section here.
Triples
A triples plugin is responsible for extracting semantic triples from the source content. You can control the embedded triples in the envelope document.
Writer
A writer plugin is responsible for writing the final envelope to the database. You can control the output permissions, URI, collections etc. of the harmonized document with this module.