1

I'm quite new to existdb/Xquery and trying to set it up for querying the metadata. So far, I have stored around 400,000 records (XML) in a single collection (all records have the same namespace tag in XML), indexed the 2 elements of the stored XML and hoping to have a reasonable query response time (with the timeout set to 30 seconds) if the query matches the indexed fields. However, When i execute the below query, i can see the Optimization as "Full" in Query profiling page (from Monex dashboard) but the query takes around 2 to 5 minutes to respond with results (After removing the timeout).

Current existdb setup:

Version: 6.0.1 
Runs as a container.
JVM config: -Xms512m and -Xmx1024m
Container resource limit:
memory: 3G
cpus: '0.5' (2 cores out of 4)

Here is the collection.xconf

<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index xmlns:lqcd="http://www.lqcd.org/ildg/QCDml/config1.3" xmlns:xs="http://www.w3.org/2001/XMLSchema" >
    <range>
      <create qname="lqcd:markovChainURI" type="xs:anyURI"/>
      <create qname="lqcd:dataLFN" type="xs:anyURI"/>
    </range>
  </index>
</collection>

After adding the collection.xconf. I triggered the re-index from GUI and I could also see the indexed elements from Monex dashboard.

Below is the Xquery I'm running:

xquery version "3.1";

declare default element namespace 'http://www.lqcd.org/ildg/QCDml/config1.3';
declare option exist:timeout "10000";
declare option exist:output-size-limit "10000";

let $xpathResult := collection("/db/mdc/config")//markovStep[contains(markovChainURI,'mc://ildg/test/cdummy0/pdummy0/edummy1')]
let $revisedXpathResult := subsequence($xpathResult,1,100)
let $count := count($xpathResult)
return <results>
            <count>{$count}</count>
            <docIds> {
                for $item in $revisedXpathResult
                    return util:document-name($item)
                    }
            </docIds>
        </results>

Sample XML stored in the collection:

<?xml version="1.0" encoding="UTF-8"?>
<gaugeConfiguration xmlns="http://www.lqcd.org/ildg/QCDml/config1.3">
    <management>
        <crcCheckSum>X</crcCheckSum>
        <archiveHistory>
            <elem>
                <revisionAction>generate</revisionAction>
                <participant>
                    <name>Dummy Name</name>
                    <institution>Dummy Institution</institution>
                </participant>
                <date>2023-03-28T16:25:17</date>
            </elem>
        </archiveHistory>
    </management>

    <implementation>
        <machine>
            <name/>
            <institution/>
            <machineType/>
        </machine>
        <code>
            <name/>
            <version/>
            <date>1970-01-01T00:00:00</date>
        </code>
    </implementation>

    <algorithm/>

    <precision>single</precision>

    <markovStep>
        <markovChainURI>mc://ildg/test/cdummy0/pdummy0/edummy1</markovChainURI>
        <series/>
        <update/>
        <avePlaquette>0.0</avePlaquette>
        <dataLFN>lfn://ildg/test/cdummy0/pdummy0/edummy1/cfg1</dataLFN>
    </markovStep>
</gaugeConfiguration>

Can someone help me to understand what do i have to do to get the optimal query responses? or any suggestions also would be greatly appreciated. Thank you in advance.

I followed the steps suggested in this question to index the elements.

I also tried to reduce the number results in response by restricting only to 100 items(using sub-sequence). I suspect the problem is in executing the query, Hence restricting the response to only 100 items did not improve in response time either.

basbs
  • 11
  • 3
  • Do you need the `contains` check? In your sample the data seems to match exactly so e.g. using `markovStep[markovChainURI = 'mc://ildg/test/cdummy0/pdummy0/edummy1']` might suffice (and there I would expect an index to help while I am not sure it helps for contains; but I am not familiar with exist-db). – Martin Honnen Jun 05 '23 at 10:47
  • Thanks for pointing it out @martin-honnen. I did try with and without contains but still the result has the same performance issue. – basbs Jun 05 '23 at 11:14

0 Answers0