1

I am using MarkLogic Java API to search JSON documents stored in MarkLogic 9 collection. My JSON is structured as below

{
  "time": "2021-02-09T11:09:53",
  "payload": {
    "a": "v1",
    "b": "v2",
    "c": [
      {
        "d": {
          "a": "v1",
          "b": "v2"
        }
      }
    ]
  }
}

I am trying to search for /payload/a=v1 and /payload/b=v2 but the search also returns all those documents where /payload/c/d/a=v1 and /payload/c/d/b=v2

Here is my Java code

StructuredQueryBuilder sqb = queryManager.newStructuredQueryBuilder();

List<StructuredQueryDefinition> list = new ArrayList<>();

list.add(sqb.collection("collectionName"));

StructuredQueryDefinition a = sqb.value(sqb.jsonProperty("a"), "v1");
StructuredQueryDefinition b = sqb.value(sqb.jsonProperty("b"), "v2");

list.add(sqb.and(sqb.containerQuery(sqb.jsonProperty("payload"), sqb.and(a, b)))));

StructuredQueryDefinition definition = sqb.and(list.toArray(new StructuredQueryDefinition[list.size()]));
DocumentPage page = docManager.search(definition, 1L);

Any help would be well appreciated.

Thanks, AK

user2459396
  • 99
  • 10

1 Answers1

2

One approach would be to use a TDE to project payload/(a|b) into a two-column view.

In the Java API, the RowManager can then match documents by criteria for those columns. Use the joinDoc() operation to join and return the full content of the documents.

An alternative would be to define a path range index on payload/a, which eliminates the false positives under payload/c/d/a. The concern about range indexes is that they are heavyweight (because they are memory mapped, they use resource whether in use or not) and provide a less general solution.

Hoping that helps,

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
ehennum
  • 7,295
  • 13
  • 9
  • This xquery works fn:collection('collection')[payload/a ="v1"][payload/b = "v2"]. Is there a way to convert this to Java API. – user2459396 Feb 09 '21 at 17:26
  • Using these XPath predicates in a searchable expression requires filtering and thus won't scale. The search engine has to retrieve every document in the collection to see if the XPath matches the document. For an indexed solution, use either TDE or a path range index. – ehennum Feb 09 '21 at 19:27
  • I am new to MarkLogic so dont know much of the stuff. Do you have any sample I can refer to as I am getting lost going through the documentation. Thanks – user2459396 Feb 09 '21 at 21:59
  • Good starting points for TDE might be https://docs.marklogic.com/guide/sql/creating-template-views and https://docs.marklogic.com/guide/app-dev/TDE#id_54035 ; a good starting point for path range indexes might be https://docs.marklogic.com/guide/admin/range_index#id_40666 – ehennum Feb 09 '21 at 23:18
  • Unfortunately, I have readonly priviledges so cant create any TDE. Is there any other way. – user2459396 Feb 10 '21 at 10:39
  • Adding and modifying indexes in support of application queries is a necessary part of any high-performance data store. If the organization isn't willing to adopt an agile process for evolving the indexes, then there will be continual problems with application performance. Trying to work around a fundamental issue like that isn't likely to be successful. – ehennum Feb 10 '21 at 16:40