I'm quite new to existdb/Xquery and trying to set it up for querying the metadata. So far, I have stored around 400,000 records (XML) in a single collection (all records have the same namespace tag in XML), indexed the 2 elements of the stored XML and hoping to have a reasonable query response time (with the timeout set to 30 seconds) if the query matches the indexed fields. However, When i execute the below query, i can see the Optimization as "Full" in Query profiling page (from Monex dashboard) but the query takes around 2 to 5 minutes to respond with results (After removing the timeout).
Current existdb setup:
Version: 6.0.1
Runs as a container.
JVM config: -Xms512m and -Xmx1024m
Container resource limit:
memory: 3G
cpus: '0.5' (2 cores out of 4)
Here is the collection.xconf
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index xmlns:lqcd="http://www.lqcd.org/ildg/QCDml/config1.3" xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<range>
<create qname="lqcd:markovChainURI" type="xs:anyURI"/>
<create qname="lqcd:dataLFN" type="xs:anyURI"/>
</range>
</index>
</collection>
After adding the collection.xconf. I triggered the re-index from GUI and I could also see the indexed elements from Monex dashboard.
Below is the Xquery I'm running:
xquery version "3.1";
declare default element namespace 'http://www.lqcd.org/ildg/QCDml/config1.3';
declare option exist:timeout "10000";
declare option exist:output-size-limit "10000";
let $xpathResult := collection("/db/mdc/config")//markovStep[contains(markovChainURI,'mc://ildg/test/cdummy0/pdummy0/edummy1')]
let $revisedXpathResult := subsequence($xpathResult,1,100)
let $count := count($xpathResult)
return <results>
<count>{$count}</count>
<docIds> {
for $item in $revisedXpathResult
return util:document-name($item)
}
</docIds>
</results>
Sample XML stored in the collection:
<?xml version="1.0" encoding="UTF-8"?>
<gaugeConfiguration xmlns="http://www.lqcd.org/ildg/QCDml/config1.3">
<management>
<crcCheckSum>X</crcCheckSum>
<archiveHistory>
<elem>
<revisionAction>generate</revisionAction>
<participant>
<name>Dummy Name</name>
<institution>Dummy Institution</institution>
</participant>
<date>2023-03-28T16:25:17</date>
</elem>
</archiveHistory>
</management>
<implementation>
<machine>
<name/>
<institution/>
<machineType/>
</machine>
<code>
<name/>
<version/>
<date>1970-01-01T00:00:00</date>
</code>
</implementation>
<algorithm/>
<precision>single</precision>
<markovStep>
<markovChainURI>mc://ildg/test/cdummy0/pdummy0/edummy1</markovChainURI>
<series/>
<update/>
<avePlaquette>0.0</avePlaquette>
<dataLFN>lfn://ildg/test/cdummy0/pdummy0/edummy1/cfg1</dataLFN>
</markovStep>
</gaugeConfiguration>
Can someone help me to understand what do i have to do to get the optimal query responses? or any suggestions also would be greatly appreciated. Thank you in advance.
I followed the steps suggested in this question to index the elements.
I also tried to reduce the number results in response by restricting only to 100 items(using sub-sequence). I suspect the problem is in executing the query, Hence restricting the response to only 100 items did not improve in response time either.