4

I have a use case where there is a parent document and some child documents to the parent document. When I search I want to always return the parent documents. let us say if a search result hit 2 child documents with same parent, then the results need to be grouped to one search result with snippets carried over from child docs. Also I want to apply paging, but pagination should be on the transformed search results. Is this possible? The relation between parent and child is the property <parent-document-location> on the child documents

Parent Document Properties

<?xml version="1.0" encoding="UTF-8"?>
<prop:properties xmlns:prop="http://marklogic.com/xdmp/property">
  <id xmlns="http://ir.abbivenet.com/content-repo/metadata">1e900d1a7210350c0b68973fb0d6dc96f83e161a</id>
  <cpf:processing-status xmlns:cpf="http://marklogic.com/cpf">done</cpf:processing-status>
  <cpf:property-hash xmlns:cpf="http://marklogic.com/cpf">34d0a49cf8835387f6bd213a31732ad4</cpf:property-hash>
  <cpf:last-updated xmlns:cpf="http://marklogic.com/cpf">2016-03-15T21:18:20.521372Z</cpf:last-updated>
  <cpf:state xmlns:cpf="http://marklogic.com/cpf">http://marklogic.com/states/done</cpf:state>
  <cpf:self xmlns:cpf="http://marklogic.com/cpf">/documents/BioEln/1e900d1a7210350c0b68973fb0d6dc96f83e161a.xml</cpf:self>
  <prop:last-modified>2016-03-15T21:50:38Z</prop:last-modified>
</prop:properties>

Child Document 1

<?xml  version="1.0" encoding="UTF-8"?>
<prop:properties xmlns:prop="http://marklogic.com/xdmp/property">
<document-parent-location xmlns="http://ir.abbivenet.com/content-repo/metadata">/documents/BioEln/1e900d1a7210350c0b68973fb0d6dc96f83e161a.xml</document-parent-location>
<context xmlns="http://ir.abbivenet.com/content-repo/metadata">BioEln</context>
<id xmlns="http://ir.abbivenet.com/content-repo/metadata">1e900d1a7210350c0b68973fb0d6dc96f83e161a</id>
<prop:last-modified>2016-03-15T21:50:34Z</prop:last-modified>
</prop:properties>

Child Document 2

<?xml  version="1.0" encoding="UTF-8"?>
<prop:properties xmlns:prop="http://marklogic.com/xdmp/property">
<document-parent-location xmlns="http://ir.abbivenet.com/content-repo/metadata">/documents/BioEln/1e900d1a7210350c0b68973fb0d6dc96f83e161a.xml</document-parent-location>
<context xmlns="http://ir.abbivenet.com/content-repo/metadata">BioEln</context>
<id xmlns="http://ir.abbivenet.com/content-repo/metadata">1e900d1a7210350c0b68973fb0d6dc96f83e161a</id>
<prop:last-modified>2016-03-15T21:50:34Z</prop:last-modified>
</prop:properties>
Ravi
  • 1,179
  • 6
  • 13

2 Answers2

3

If you want to search across the child document and only return a result/snippet for each parent document, then probably the ideal solution is to combine the parent and child documents into a single document during ingestion. Modeling your data this way, you can write queries to search the child document data, and then you can transform the parent document result during snippet generation.

Any solution that keeps these data in separate documents will require "joining" data at runtime and selecting more results per page to facilitate deduplication of parent documents. Each of those will incur a performance penalty compared to a "denormalized" single document, and it will probably make the implementation more complex.

wst
  • 11,681
  • 1
  • 24
  • 39
  • Thank you, unfortunately in my case I cannot combine then to one single document and also some of my documents are binary. – Ravi Mar 18 '16 at 17:47
2
  • If parent-document-location is an id that exists in the parent document, then use range indexes and create a shotgun query.
  • If the parent-document-location is a uri, then add an index and use cts-values on that and pipe the uris into a cts:document-query
  • Otherwise, another non-intrusive way is with some magic related to using collections on the group of documents and some magic combinations of cts:collections piped into cts:collection-query.

A bit of muscle-work on all of the above to get it up and running(index or collections), but all of those options run off of range-indexes or lexicons. All of which I suggest would not require de-duplication (because we would make sure we isolated the parent documents in some way via query)

If any of the above are interesting to you, post some sample documents and uris and we can work from there to refine an answer to more specifically meet your needs

  • All the child documents and the master document have a property `` and it has element range-index and also the master document is in it own collection and the child documents are in their own collections ... What is a shotgun query ?? I will attach to the main question some sample files – Ravi Mar 18 '16 at 18:10
  • So, the parent-document-location element in the children has the id of the parent document id found in the parent document? – David Ennis -CleverLlamas.com Mar 18 '16 at 18:21
  • Yeah all the documents have property `` in them and they are same for both parent and child documents but the parent document is in a different collection – Ravi Mar 18 '16 at 18:22
  • The problem is with shot-gun query approach how to maintain the score and pagination and also snippets ? – Ravi Mar 18 '16 at 18:30
  • This is what I did, but I want the score to be taken into account as well `let $vals := cts:values((cts:element-reference(xs:QName('meta:id'))),(), ('properties'), $query-original) let $results := cts:search(fn:doc(), cts:and-query((cts:collection-query(("http://ir.abbvie.com/content-repo/type/master")), cts:document-fragment-query( cts:element-range-query(xs:QName('meta:id'), "=",$vals)))))[1 to 10] return $results` – Ravi Mar 18 '16 at 18:49
  • Ohh.. sort order based on the score from the children is an issue. pagination and snippets - you actually paginate on the main document and then join the children and snippets onto the paginated results. – David Ennis -CleverLlamas.com Mar 21 '16 at 20:20