1

I have encountered an issue in the calculation of fieldLength value in Solr 6. I am using BM25 as the similarity measure. When i index a set of documents, the fieldLength values for these documents are very erroneous. For a field of title containing only 9 words, the fieldLength field stores a value of "5.6493154E19" which is entirely incorrect. When I re-index an individual document, the score is corrected and it shows the fieldLength value to be "10.24". Now when I re-index the whole corpus, the values are again corrupted and the fieldLength value is again "5.6493154E19"

Original fieldLength value stored:

     4.641637E-19 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
    1.0 = termFreq=1.0
    1.2 = parameter k1
    0.75 = parameter b
    10.727212 = avgFieldLength
    5.6493154E19 = fieldLength

After Re-indexing an individual Document:

     1.0189644 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
    1.0 = termFreq=1.0
    1.2 = parameter k1
    0.75 = parameter b
    10.72807 = avgFieldLength
    10.24 = fieldLength

After re-indexing the whole corpus:

      4.641637E-19 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:
    1.0 = termFreq=1.0
    1.2 = parameter k1
    0.75 = parameter b
    10.727212 = avgFieldLength
    5.6493154E19 = fieldLength

Any ideas on where the problem is?

K.Ali
  • 41
  • 4
  • Have you seen [How is field length defined in Lucene](https://stackoverflow.com/questions/22194920/how-is-field-length-defined-in-solr-lucene)? The value is not meant to be exact, as it's a 32-bit float encoded into a 8-bit format. – MatsLindh Mar 26 '18 at 18:40
  • yeah i know this thing and that is why I am expecting a 9 word field to have a field length equal to 10.24. But this is not the case here. Here it is storing a 9 word field with a field length of '5.6493154E19'. – K.Ali Mar 27 '18 at 03:27

0 Answers0