4

I'm using the below mentioned query to get distinct values from XML files stored in a collection in MarkLogic. Collection contains more than 40k files.

When the query is executed it takes a long time for the results. Is there any better way to optimize the below query or any other options to use this query without XPath.

Xquery:

fn:distinct-values(fn:collection(collectionName)//caseml/case[@jur eq in]/@year)

Input XML Example:

<?xml version="1.0" encoding="UTF-8"?>
<caseml>
  <case jur="in" series="mlj" volume="1" year="2016" startpage="129">
    <p num="y" pnum="22">
      <text>
        In view of the aforesaid discussion, we find the writ petition completely devoid
        of any merit and accordingly, we dismiss the same, leaving the parties to bear their
        own costs.
      </text>
    </p>
  </case>
</caseml>

The above XQuery is working, but need to get the results faster.

Dave Cassel
  • 8,352
  • 20
  • 38
Sankar
  • 80
  • 1
  • 7
  • First thing to try would be to remove the // from the XPath as it results in the whole document being searched. Absolute paths are always more efficient. – chrisis Jul 11 '16 at 07:57
  • Thanks for your comment. I'll remove the // from the Xpath – Sankar Jul 11 '16 at 08:42

1 Answers1

10

For fast atomic value retrieval across a large set of documents you want to configure a range index, which instructs MarkLogic to extract the values at index time and keep them in a memory-resident data structure so they can be accessed without touching the disk. Since you want the values at a specific path you'll want to configure a path range index. After reindexing you can use cts:values to retrieve the values. You can optionally pass a cts:query to the call to restrict things to documents matching some criteria.

hunterhacker
  • 6,378
  • 1
  • 14
  • 11
  • Bumped up the answer for Hunterhacker as it is correct for all cases. But want to not that in some cases, such as this example where the value in question is a year and not including punctuation, then a lexicon may be enough fine here as well. – David Ennis -CleverLlamas.com Jul 11 '16 at 10:56