0

I have the following query to be performed in SOLR 4.7

query=yellow tree house

And doc1 and doc2 are as follows

doc1=house house house house house 
doc2=yellow tree

On the default SOLR implementation doc1 will be ranked the first since the term house is repeated many times and its tfidf will be higher.

On the other hand I need that doc2 will be ranked as the first result instead of doc1 since the co-occurence of at least two query terms among three is better than one term appearing many times.

How can I tune SOLR to perform it? Can BM25 be a solution to that problem?

  • Regarding the SOLR score customization, this SO answer should help you: https://stackoverflow.com/a/22027652/9480229 – Nicolas Cami Apr 10 '18 at 13:03
  • The BM25 should reduce the impact of a large quantity of the same term in a document, but I'm not sure this is what you need, since you want to nullify the `TF` in a document. – Nicolas Cami Apr 10 '18 at 13:09
  • Thank you, customizing the the score function seems complicated. I tried with simplistic solutions such as setting the mm parameter but it didn't work out. – Ivo Kurtanovic Apr 10 '18 at 14:30
  • I would try the subsets of the query such as "yellow tree house"^10 OR "yellow tree"^5 OR "tree house "^5 but there should be something simpler, since my requirement is very basic. I don't necessarily want to nullify the tf but reduce its importance wrt the multiple term coocurence. – Ivo Kurtanovic Apr 10 '18 at 14:31
  • The issue with such query is that it is very specific. But if you really want yellow to be scored higher than house (for instance), indeed you can do something like that if you don't want to implement any score customization: `yellow^10 OR house^5` – Nicolas Cami Apr 10 '18 at 15:03
  • Try BM25? It should be available in 4.7. If you don't need phrase search (or can use a second field for that), you can set `omitTermFreqAndPositions="true"` for the field. – MatsLindh Apr 10 '18 at 18:25

1 Answers1

1

You are using SOLR 4.7 version, I have tried same on solr version 7.0 and its working exactly same as you want.

http://localhost:8983/solr/burrp/select?fl=*,score&q=name:yellow%20tree%20house

And response is :

{
 responseHeader: {
 status: 0,
 QTime: 0,
 params: {
 q: "name:yellow tree house",
 fl: "*,score",
 },
},
response: {
numFound: 2,
start: 0,
maxScore: 1.6810184,
docs: [
 {
  id: "2",
  name: "yellow tree",
  sname: "yellow tree",
  _version_: 1597543998903287800,
  score: 1.6810184,
},
{
 id: "1",
 name: " house house house house house  ",
 sname: " house house house house house  ",
 _version_: 1597543972785356800,
 score: 1.1577512,
},
],
},

You can check same on solr 7 version.