I have a large set of XML documents in Marklogic that contain a so called ‘smart number’ (ex. First 2 characters represent Department, second 3 represent project etc.). Parsing the required information from the numbers is pretty complex and requires database look ups and such. We have a java process that handles the parsing. Each document can contain several of those numbers and I’d like to be able to query the set of XMLs based on attributes of the smart number. For example how many hours were billed for a given department or get a break down of how many hours went to a given project (this data can be spread across many documents). This makes me think that I need to somehow attache the parsed data to the XML document.
I’m new to Marklogic and I’m wondering what would be considered best practice for this kind of situation. One thing I can think of is to edit each XML file and add the parsed data into the XML:
So this:
<ELEMENT>
<SMART_NUMBER>Blah, Blah, Blah</SMART_NUMBER>
</ELEMENT>
<ELEMENT>
<SMART_NUMBER>Blah2, Blah2, Blah2</SMART_NUMBER>
</ELEMENT>
Becomes this:
<ELEMENT>
<SMART_NUMBER>Blah, Blah, Blah</SMART_NUMBER>
<PARSED_DATA>
<DEPARTMENT>BLAH BLAH</DEPARTMENT>
<PROJECT>BLAH BLAH</PROJECT>
…
</ PARSED_DATA>
</ELEMENT>
<ELEMENT>
<SMART_NUMBER>Blah2, Blah2, Blah2</SMART_NUMBER>
<PARSED_DATA>
<DEPARTMENT>BLAH2 BLAH2</DEPARTMENT>
<PROJECT>BLAH2 BLAH2</PROJECT>
…
</ PARSED_DATA>
</ELEMENT>
I’m not sure if there is a ‘better’ way, using Semantics seems possible: for each smart number in a document create a triplet that links the document to the smart number. Then for each smart number create a set of triplets that that define the various parts of the smart number. But I’m very unfamiliar with using semantics so I don’t know if this approach would even be worth pursuing. Any ideas/suggestions would be welcome.