0

I created a query to find parent documents in SOLR by filtering on both child and parent properties. I have simplified it for this example to:

{!parent which='content_type:"parent" AND field_a="value" AND field_b="value"'}((child_field_x:("VALUE" ) AND field_y:value))

Only parent documents have 'content_type:parent'. SOLR only returns parent documents, so that works.

Now I'm creating crossings between to other fields, lets say field_c and field_d. For all possible values of both C and D I want to calculate the number of parent documents. For each combination of values I now do this:

{!parent which='content_type:"parent" AND field_a="value" AND field_b="value" AND field_c="value" AND field_d="value"'}((child_field_x:("value" ) AND child_field_y:value))

When I add up all the results of alle these queries however, I get a much larger number then with the original query above. The original query would give me 15k results, if I add up all rows I get 80k results.

I did some testing and notice that if I take a specific value for C and a specific value for D these were the results:

Filtering only on C: 12.522 documents
Filtering only on D: 15.205 documents
Filtering on both (AND): 12.349 documents
Filtering on C and negate D: 3.265 documents -> expected 
   the difference between C and D which would be 2.683

Both field_c and field_d are single value.

If I remove the child query (everything after }), but leave it like {!parent which='(..) I do get the correct sum. It's only when I start adding the child document query that it doesn't add up anymore.

I just don't get it, why does this happen? I have a feeling I'm not getting something from the concept of child documents, but can't seem to find anything looking at examples and documentation. It does seem to correctly filter on the parent properties, but probably the child documents are not queried correctly, or so it seems.

UPDATE I did some extra testing by looking at the results generated. There are no duplicates in the result set and the results of parent documents are correct for the parent filters. I wasn't able to check the child documents that belong to those companies yet, but it seems to be a problem there.

One thing I noticed: if I change the default query operator to 'AND' instead of 'OR' I get 0 results in every crossing. Since my query already contained 'AND' only, I didn't get why this would be the case.

Frank
  • 530
  • 5
  • 15

1 Answers1

0

I finally managed to find a solution. Its best to work with join query parsers. If you want to filter parent documents having child documents according to a specific condition then do this:

Query: myparentfield:"value" AND myotherparentfield:"othervalue"
FQ: {!join from=_root_ to=_root_}mychildfield:"childvalue" AND myotherchildfield:"otherchildvalue"

Feel free to replace AND's with OR's there.

Now if you want two child conditions with an AND condition (so the parent should have a child complying to condition A, but also (another) child complying to condition B) then use this:

Query: myparentfield:"value" AND myotherparentfield:"othervalue"
FQ: {!join from=_root_ to=_root_}mychildfield:"childvalueA" AND myotherchildfield:"otherchildvalueA"
FQ2: {!join from=_root_ to=_root_}mychildfield:"childvalueB" AND myotherchildfield:"otherchildvalueB"

If you want to get the parents that have either a child with condition A or a child with condition B use this:

Query: myparentfield:"value" AND myotherparentfield:"othervalue"
FQ: {!join from=_root_ to=_root_}(mychildfield:"childvalueA" AND myotherchildfield:"otherchildvalueA") OR (mychildfield:"childvalueB" AND myotherchildfield:"otherchildvalueB"`)

It is important to have a _root_ field in your schema according to the following field definition:

 <field name="_root_" type="string" indexed="true" stored="false" docValues="true" />

You may also add something like content_type with either a value of 'parent' or 'child'. Use it to filter on content_type:parent in the main query if you want to only return parent documents.

I hope this helps someone, since I feel the SOLR documentation is a bit limited. The documentation is there, but not very extensive on the subject of child documents/embedded documents.

Frank
  • 530
  • 5
  • 15