1

Given a json-property in a document, I need to fetch all the distinct values that are present for that property across all the documents present in the collection. Is this possible using the Marklogic Java Client API?

For e.g., I have 3 documents of type "MyDocument" with property "myProperty" as -

MyDocument1.json

{
    "myProperty":"val1"
}

MyDocument2.json

{
    "myProperty":"val1"
}

MyDocument3.json

{
    "myProperty":"val2"
}

I want to search all the distinct values for "myProperty", i.e., the result should be "val1" and "val2".

And also if I can group the documents by those distinct values. E.g.,

{
    "val1": [
       {
           "myProperty":"val1"
       },
       {
           "myProperty":"val1"
       }
    ],
    "val2": [
       {
           "myProperty":"val2"
       }
    ]
}

I'd appreciate any help or nudge in the right direction. Thanks in advance!

Update:

I was able to use the qConsole to get the result I was looking for, but using XQuery. Here's what I did -

    xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy";
    
    let $options :=
        <options xmlns="http://marklogic.com/appservices/search">
            <values name="myProperty">
                <range type="string" facet="true">
                    <json-property>myProperty</json-property>
                </range>
            </values>                
        </options>
        
    return search:values("myProperty", $options)

And the result I got was -

<search:values-response name="myProperty" type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:search="http://marklogic.com/appservices/search">
    <search:distinct-value frequency="2">val1</search:distinct-value>
    <search:distinct-value frequency="1">val2</search:distinct-value>
</search:values>

Now I need to achieve this using the Java Client API.

Adee J
  • 75
  • 1
  • 5

1 Answers1

1

The easiest solution is to use a range-index on that property in order to select the values from the lexicon with a function such as: https://docs.marklogic.com/cts:values

Otherwise, you could attempt a brute-force method. You could iterate over each of the docs, select the value of the property and put into a map, then report the keys from the map.

If you have a really large collection of documents with that JSON property, then it may not be possible to collect the data in a single query without hitting timeouts or Expanded Tree Cache errors. You could run a CoRB job to write the values of that property to an output file and apply the EXPORT-FILE-SORT option with the value ascending|distinct

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147