I have a large eXist-db database in which I am implementing some functions to manage the data inside.
The pertinent information here is:
There are 2000 customers.
Each customer has access to 400 documents.
Now each document has 20 languages.
So I have 2000 XMLs, each have something like this:
<customer name="foo">
<document num="A01" subscribed="Y">
<languages>
<lang subscribed="N">Arabic</lang>
<lang subscribed="Y">Polish</lang>
... and so on for 400 documents for 20 languages ...
Now I an trying to write an xQuery because I need to revoke a language. This means I need to essentially delete in 2000 files, for every document, the <lang> whose name is defined.
So something like this:
for $langs in $g:collection.customers//lang[text()=$deletelang]
return update delete $langs
But this is taking forever and actually requires a huge amount of memory. Certainly in looking at it ... $langs would be 2000*400 = 800,000 items.
I would note that a range index exists which contains:
<create qname="lang" type="xs:string" nested="no"/>
But still, is that delete query so inefficient that essentially it cannot execute with 800,000 items or is there some other way that should be written?
Update I
So I changed things a bit to just see.
1) Per the comment, I changed to "." from text()
2) I added subsequence() surrounding this to test various sizes>>
for $langs in subsequence($g:collection.customers//lang[.=$deletelang],1,30000)
1-30000 = 24s
so running the whole collection now = 110s