1

I am using Solr 7.6 with the document structure is as follows:

{
    "source_ln":"en",
    "source_text":"the sky is blue",
    "target_ln":"hi",
    "target_text":"आसमान नीला है",
},
{
    "source_ln":"en",
    "source_text":"the sky is also called the celestial sphere",
    "target_ln":"hi",
    "target_text":"आकाश को आकाशीय क्षेत्र भी कहा जाता है",
}

All the fields are defined with the StandardTokenizerFactory tokenizer.

When I query "source_text":"the sky",

The result set should contain the first document only.

In the second document the field "source_text":"the sky is also called the celestial sphere" contains 8 terms and the query field "source_text":"the sky" contains the 2 terms only, So the at least 50% match criteria is not fulfilled and hence 2nd document would not be in the result set.

Is there any way to get the documents matching at least 50% of the query field terms/tokens?

Thanks in advance.

Vaibhav Raut
  • 95
  • 1
  • 1
  • 5

2 Answers2

1

You can set your request handler to use a (e)dismax query parser, for example using the defTypeparameter eg. ?q=...&defType=dismax.

Using a dismax parser, you can then use the mm (Minimum Should Match) parameter according to your needs, just by setting mm=50%.

EricLavault
  • 12,130
  • 3
  • 23
  • 45
0

You can achieve the features by doing below steps.

  • Create separate field in your schema name "source_text_fifty", param(indexing=true, storing=false, and don't apply StandardTokenizerFactory grammar type or better create separate datatype field with solr.KeywordTokenizerFactory ).
  • Now, Calculate 50% of your input during Indexing the doc and store those calculated data in "source_text_fifty" field.
  • Re-index all exiting data with above logic.
  • Run query with source_text_fifty:"the sky". Now you got only one 50% match data.
  • What do you mean by "calculate 50% of the input during indexing the doc"? Actually I am using Solr to build a language translation. So I am looking for the functionality to get the most relevant partial match results. – Vaibhav Raut Jan 10 '20 at 10:05
  • In case of partial match results, this 50% logic not work. This only for exact 50% match from starting of string. According to your example in question. – Ashutosh Tiwari Jan 13 '20 at 11:48