MarkLogic documentation describes a fast pagination technique using unfiltered searching somewhat similar to this
let $uris := cts:uris((),(),
cts:collection-query('fish')
) [1 to 10]
for $uri in $uris
let $fish := fn:doc($uri)
return <fish>
{ $fish/fish/variety }
{ $fish/fish/colour }
</fish>
In reality, the cts:uris() would have a much more complex search term.
Basically, the [1 to 10] controls the range of "rows" returned, and the following FLWOR is all about selecting the data to return.
What about if the result of the first search are to be joined with some other data, and/or filtered, and then only selected rows returned.
let $uris := cts:uris((),(),
cts:collection-query('fish')
)
for $uri in $uris
let $fish := fn:doc($uri)
let $pond := fn:doc($fish/fish/pond-uri/text())
where $d/fish/variety = ('koi','goldfish')
and $pond/pond/type/text() = ('lilypond','gardenpond')
return <fishandpond>
{ $fish/fish/variety }
{ $pond/pond/type }
</fishandpond>
Again, I want the first 10 results. Clearly can't constrain the let $uris :=, as we don't know how many URIs we need to search to be sure to get at least 10 results after the following FLWOR.
Refactoring like this :-
let $uris := cts:uris((),(),
cts:collection-query('fish')
)
let $urisFiltered := for $uri in $uris
let $fish := fn:doc($uri)
let $pond := fn:doc($fish/fish/pond-uri/text())
where $d/fish/variety = ('koi','goldfish')
and $pond/pond/type/text() = ('lilypond','gardenpond')
return <fishandpond>
{ $fish/fish/variety }
{ $pond/pond/type }
</fishandpond>
return $urisFiltered[1 to 10]
Does produce 10 results, but MarkLogic appears to actually compute the full set of URIs and then filter, and not lazily evaluate to produce 10 results, stopping once it got there, even if this means it only had to work out the first 15 or so elements of $uris.
I say this because if I add xdmp:sleep(1) into the loop, the query delays by an amount related to the total number of fish in the database, not the number required in the final result set.
For my next attempt, I tried using the XCC/J interface and using Request.setCount(10) to indicate that I only care about the first 10 results. Again, I get 10 results, but all indications are that it isn't executing lazily and is actually finding all fish and filtering.
So, my question is:
Is there a known coding pattern that can achieve efficient paginated (or even just first N results) searches, when documents need to be joined and/or filtered, after an initial cts:uris() or cts:search() step?
And as a supplementary question: is there a good summary of when MarkLogic does behave in a lazy fashion, and when it doesn't?
{{{ Andy