0

I am trying to search for document XMLs in Marklogic which have the elements <document> more than once. Following is the structure of one such document XML i want to retrieve:

<root>
    <documents>
        <document>
            <id>1</id>
            <name>roy</name>
            <otherData></otherData>
        </document>
        <document>
            <id>2</id>
            <name>roy</name>
            <otherData></otherData>
        </document>
        ....
        ...
        ..
        .
        <document>
            <id>3</id>
            <name>roy</name>
            <otherData></otherData>
        </document>

    </documents>
</root>

I do not want to retrieve the XMLs which are of the following structure:

<root>
    <documents>
        <document>
            <id>3</id>
            <name>roy</name>
            <otherData></otherData>
        </document>
    </documents>
</root>

I can search for existence or minimum one using element-query with xs:QName("document"), but not sure how to go about searching with more than one.

Any help would be much appreciated.

Prakash K
  • 11,669
  • 6
  • 51
  • 109

2 Answers2

2

There is no real simple way of doing this that scales well in MarkLogic. The easiest way out is by enriching the documents, adding a count attribute to the <documents> element, and keeping it up to date each time you touch the document. You can then do a straight-forward range index on the count attribute, and directly get what you are after.

HTH!

grtjn
  • 20,254
  • 1
  • 24
  • 35
  • 1
    huh? I have seen you around the xquery tag often enough to think you are not talking nonsense, but I am honestly surprised. Could you maybe expand why this is not possible in ML? To me it seems a simple XQuery including a `count(/document) > 1` should be sufficient. – dirkk Nov 13 '17 at 23:19
  • 1
    You have a point. One could indeed use XPath. The poster was talking about searching though, and you cannot express the same in a search query. Also keep in mind that searches are usually much faster than XPath statements in MarkLogic, because they run filtered and the ordering dictated by XPath.. – grtjn Nov 14 '17 at 06:35
1

I like the accepted answer but tried to do differently using both cts:search and XPATH. It is faster than using only XPATH:

xquery version "1.0-ml";
let $query1:= cts:element-query(xs:QName("document"),cts:and-query( () )),
$query2:= cts:element-query(xs:QName("documents"),$query1)
return 
for $doc in cts:search(collection(),$query2,("unfiltered")) where count($doc/root/documents/document) > 1  return  $doc
mg_kedzie
  • 337
  • 1
  • 9