0

I am trying to write a SOLR query to match documents with a multi-valued target field. I'd like SOLR to retrieve the documents that contain at least one of the the query terms plus a maximum specified number of terms. For example, consider the following documents:

<doc1>
    <multival_field>A</multival_field>
    <multival_field>B</multival_field>
    <multival_field>C</multival_field>
    <multival_field>D</multival_field>
    <multival_field>E</multival_field>
</doc1>
<doc2>
    <multival_field>A</multival_field>
    <multival_field>B</multival_field>
    <multival_field>D</multival_field>
</doc2>
<doc3>
    <multival_field>A</multival_field>
    <multival_field>B</multival_field>
    <multival_field>D</multival_field>
    <multival_field>F</multival_field>
</doc3>
<doc4>
    <multival_field>A</multival_field>
    <multival_field>B</multival_field>
    <multival_field>C</multival_field>
</doc4>

I would like to write a query that specifies the terms 'A', 'B','C' and an missing terms count '1'. This query would fetch doc4 (since it has all the query terms and nothing else) and doc2 (since it has 2 of the query terms and only 1 additional term that doesn't exist in the query)

Thanks in advance.

bfaskiplar
  • 865
  • 1
  • 7
  • 23
  • Would `q=multival_field:A multival_field:B multival_field:C&q.op=AND` give you what you're looking for? I'm not sure if the field length will be affected by those that have more terms, but the difference might be too small to affect the score between just four and five terms (appending `debug=all` to your query will show exactly how the scores are calculated) – MatsLindh Nov 02 '20 at 12:05
  • thanks for the comment. I've updated the question, now it is asking a slightly different thing which is more aligned with what I'd like to accomplish. – bfaskiplar Nov 04 '20 at 11:24
  • Wouldn't doc4 also be included in your description since it matches A, B and C? Would using minimum match, `mm=-1` work? (i.e. all terms given must match, except for one) The query would be `q=A B C&mm=-1&defType=edismax` – MatsLindh Nov 04 '20 at 11:36
  • you are right! I made a typo in the description. doc4 and doc2 would have to be fetched. But wouldn't your query fetch doc1 as well? (doc1 has A and B so it is matching all the query terms except for C) I don't want it to fetch doc1 as it has 2 terms that don't exist in the query. – bfaskiplar Nov 04 '20 at 14:15
  • It would, so that won't work. I'm not sure if there's a decent way to do this without implementing additional code, such as a postfilter: (this is old and for 4.9, but the idea is the same) http://qaware.blogspot.com/2014/11/how-to-write-postfilter-for-solr-49.html – MatsLindh Nov 04 '20 at 20:35

0 Answers0