Search only those XMLs when a particular element occurs multiple times in Marklogic

Question

I am trying to search for document XMLs in Marklogic which have the elements <document> more than once. Following is the structure of one such document XML i want to retrieve:

<root>
    <documents>
        <document>
            <id>1</id>
            <name>roy</name>
            <otherData></otherData>
        </document>
        <document>
            <id>2</id>
            <name>roy</name>
            <otherData></otherData>
        </document>
        ....
        ...
        ..
        .
        <document>
            <id>3</id>
            <name>roy</name>
            <otherData></otherData>
        </document>

    </documents>
</root>

I do not want to retrieve the XMLs which are of the following structure:

<root>
    <documents>
        <document>
            <id>3</id>
            <name>roy</name>
            <otherData></otherData>
        </document>
    </documents>
</root>

I can search for existence or minimum one using element-query with xs:QName("document"), but not sure how to go about searching with more than one.

Any help would be much appreciated.

grtjn · Accepted Answer · 2017-11-14T06:53:24.690

2

There is no real simple way of doing this that scales well in MarkLogic. The easiest way out is by enriching the documents, adding a count attribute to the <documents> element, and keeping it up to date each time you touch the document. You can then do a straight-forward range index on the count attribute, and directly get what you are after.

HTH!

edited Nov 14 '17 at 06:53

answered Nov 13 '17 at 18:49

grtjn

20,254
1
24
35

1

huh? I have seen you around the xquery tag often enough to think you are not talking nonsense, but I am honestly surprised. Could you maybe expand why this is not possible in ML? To me it seems a simple XQuery including a `count(/document) > 1` should be sufficient. – dirkk Nov 13 '17 at 23:19
1

You have a point. One could indeed use XPath. The poster was talking about searching though, and you cannot express the same in a search query. Also keep in mind that searches are usually much faster than XPath statements in MarkLogic, because they run filtered and the ordering dictated by XPath.. – grtjn Nov 14 '17 at 06:35

score 1 · Answer 2 · answered Dec 14 '17 at 00:18

I like the accepted answer but tried to do differently using both cts:search and XPATH. It is faster than using only XPATH:

xquery version "1.0-ml";
let $query1:= cts:element-query(xs:QName("document"),cts:and-query( () )),
$query2:= cts:element-query(xs:QName("documents"),$query1)
return 
for $doc in cts:search(collection(),$query2,("unfiltered")) where count($doc/root/documents/document) > 1  return  $doc

Search only those XMLs when a particular element occurs multiple times in Marklogic

2 Answers2