I have the following XQUERY running in ExistDB (against XML documents that follow the TEI schema):
xquery version "3.1";
declare namespace tei="http://www.tei-c.org/ns/1.0";
let $data-collection := "/db/apps/deheresi/resources/documents"
let $people-collection := "/db/apps/deheresi/resources/documents/codes_people.xml"
for $msdoc in collection($data-collection)/tei:TEI[contains(@xml:id,'ms609')]
for $ordinal in $msdoc/tei:text/tei:front//tei:div[@type='registry_ordinal']/replace(@n, '#', '')
for $doctype in $msdoc/tei:text/tei:front//tei:div[@type='doc_type']/replace(@subtype, '#', '')
for $folio in $msdoc/tei:text/tei:front//tei:div[@type='folio']/replace(@n, '#', '')
for $nameref in $msdoc/tei:text/tei:body[1]/tei:p[1]/tei:seg[1]/tei:persName[@role = 'dep']/replace(@nymRef, '#', '')
for $persname in normalize-space(string-join(doc($people-collection)//tei:person[@xml:id = $nameref]))
return concat('<td>',$ordinal,'</td><td>',$folio,'</td><td>',$doctype,'</td><td>',$persname,'</td>')
Organization of XML documents:
There are 700+ TEI documents, each with
<TEI xml:id="foo_1.xml">
as the root node (document identifier increments foo_1.xml, foo_2.xml, foo_3.xml, etc.) (always in the same place)Each TEI document contains a single unique element identifying a person
<persName role="dep" nymRef="#unique_foo_name">
(not always in the same place in a document)A separate XML document
codes_people.xml
that contains 1500+ xml:ids of distinct people
The function does the following:
get the identifying
tei:TEI/@xml:id
and thetei:persName[@role="dep"]/@nymRef
from each xml documentWith the
tei:persName[@role="dep"]/@nymRef
I look up the the name incodes_people.xml/tei:person/xml:id="unique_foo_name"
This all returns the expected results...except it's really, really slow (4 seconds). Obviously I'm testing on a local computer and not a server, but I would like to optimize the queries before testing on more powerful servers.
ADDED PER REQUEST:
ExistDB version : 3.3.0
Sample output (the eventual target is an HTML table)
<td>0001</td><td>1r</td><td>Deposition</td><td>Arnald Garnier</td>
<td>0002</td><td>1r</td><td>Deposition</td><td>Guilhem de Rosengue</td>
<td>0003</td><td>1r</td><td>Deposition</td><td>Hugo de Mamiros</td>
<td>0004</td><td>1r</td><td>Deposition</td><td>P Lapassa senior</td>
Many thanks in advance.
EDIT: I've added more information in a self-response below, and a link to all the files in Dropbox in the comments.