0

Environment: eXist-DB 4.4 / Xquery 3.1

I have hundreds of tei:xml documents in which are encoded named entities persName and placeName. The documents are in

 collection("db/fooapp/data")

Each instance of persName and placeName has an attribute @nymRef which contains a single value that refers to an xml:id in a master documents:

 db/fooapp/data/codes_persons.xml

 db/fooapp/data/codes_places.xml

These master documents contain, among other things, the canonical name of each person or place.

I am frequently doing single lookups for a certain single name, for example

let $x := some @nymRef

let $y := doc(db/fooapp/data/codes_places.xml)//tei:place[@xml:id=$x]//tei:placeName/text()

return $y

But, there are times where I need to do this, cycling through huge lists. For example, across all the documents I need to output an id for a seg and it has a (or multiple) child element placeName/@nymRef:

 <seg xml:id="fooref">some text<placeName nymRef="fooplace"/>some text</seg>

The task is to obtain all the seg/@xml:id and then lookup and output the canonical name of any placeName/@nymRef underneath it. This results in numerous round trips that are really inefficient, but I do not know any other means to do this in eXist-DB. The costly roundtrip is expressed at let $c, cycling through return:

let $coll := collection("db/fooapp/data")

for $a in $coll//seg

    for $b in $a//placeName

        let $c := $doc("db/fooapp/data/codes_places.xml")//tei:place[@xml:id=$b/data(@nymRef)]//tei:placeName/text()

        return 
              <tr>
                <td>{$a/@xml:id}</td>
                <td>{$c}</td>
              </tr>

This can add up to hundreds of round trips for a single table output.

I have no objections to restructuring the task into multiple functions if necessary.

Many thanks in advance.

jbrehr
  • 775
  • 6
  • 19

1 Answers1

1

Please provide us with an input xml and the desired output, otherwise there is no way to rewrite your query. We also need to see your index configuration.

Some general advice, for avoiding roundtrips:

  • First off, see my previous answer to your question on the use of ft:query(). When doing [@xml:id=$b/data(@nymRef)] is exist using indexes or are you forcing it to do a string comparison without having an index configured on that string?

  • id() is the fastest way possible to lookup xml:id values

  • distinct-values is your friend to only look-up each distinct key:value pair once.

  • Use a single for loop to avoid iterating over the same data multiple times.

  • Whenever possible go for more restrictive XPath expressions, // probably loads a lot of unnecessary data into memory.

All of these and more can be found in the documentation

duncdrum
  • 723
  • 5
  • 13
  • Thanks for these tips. The indexes indeed sorted out a lot of problems, such that I can't really complain about speed. Even my worst-formulated queries run under 0.5 seconds now. So this really is about learning best practices. `distinct-values` will be very useful. The challenge is learning exactly how to reduce multiple for loops into one - I'm still at the early stages of learning XQuery. Thanks again. – jbrehr Nov 03 '18 at 11:11