0

We have integrated solr search with .net project, but we are facing some issues related to document boosting or scoring feature of solr.

Problem: Solr is not returning score as per term frequency in document.

Eg:- We have created four documents whose Title contain term "Link" and solr has returned score as below:

1)Link ==> 6.037953
2)Link Link Link Link Link ==> 5.9249415
3)Link Link ==> 5.374235
4)Link Link Link ==> 5.2746024

Can anyone please help me on solr scoring or boosting issue.

Abhi
  • 4,123
  • 6
  • 45
  • 77

1 Answers1

1

Scoring calculation for Solr is something really complex. Here, you have to begin with the primal equation:

score(q,d) = coord(q,d) · queryNorm(q) · ∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) )

You have tf parameter which represents term frequency and its value is the squareroot of the frequency of the term.

You also have norm (aka fieldNorm) which is used in fieldWeight calculation. Let's take your example:

Link Link Link Link Link

Your score will be calculate like (you can see this by adding debugQuery parameter):

5.9249415 = fieldWeight, product of:
  2.236068 = tf(freq=5.0), with freq of:
    5.0 = termFreq=5.0
  idf (wich will be the same for all your scores)
  0.4375 = fieldNorm(doc=177)

link

6.037953= fieldWeight, product of:
  1.0 = tf(freq=1.0), with freq of:
    1.0 = termFreq=1.0
  idf (wich will be the same for all your scores)
  1.0 = fieldNorm

Here, link has a better score than the other because fieldWeight is the product of tf, idf and fieldNorm. This last one is higher for link document because he only contains one term.

As above documentation said:

lengthNorm - computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score.

The more terms you have in a field, lower fieldNorm will be. Be careful with the value of this field.

So, to conclude, here you have a perfect mix to understand that the score is not calculated only with the frequency but also with the number of term that you have in your field.

Community
  • 1
  • 1
alexf
  • 1,303
  • 9
  • 20
  • Do you have any idea on How to retrieve word frequency on multiple words in solr 5.2.1 . e.g using termfreq(Field, 'searchterm') function I get frequency for exact match only. but not for each word inside single quote. – Santosh Balid Nov 04 '15 at 12:26
  • Thanks for help, but my concern is solr returns score which is confusing, see for 2)Link Link Link Link Link ==> 5.9249415 score which should be lowest compare to others.Do you have any idea on How to retrieve word frequency on multiple words in solr 5.2.1 . e.g using termfreq(Field, 'searchterm') function I get frequency for exact match only. but not for each word inside single quote. – Santosh Balid Nov 04 '15 at 12:32
  • As I said, for `Link Link Link Link Link`, you have a lower `fieldNorm` but a higher `tf`, so to have the order, you have to make a mix between these 2 coefficients ! – alexf Nov 04 '15 at 13:11
  • For the other question, I don't think it is possible without create yourself a custom function – alexf Nov 04 '15 at 13:12