4

I am new to the language modeling and a make a 3grams language model using kenlm(or this) from a large text file (~7gb.). I make a binary file from my language model and call it in python like this:

import kenlm
model = kenlm.LanguageModel(<my .klm file>)
model.score(<my sentence>)

and i get a negative number as the result.and when i change the sentence for scoring, the result remains negative but changes.I give it exactly one of the large text file sentences but it gives me a bad negative number(in comparison with a sentence that does not in the text file) I dont know what does negative result means and how can i convert it to positive and normal result to select the most correct sentece between some sentences.

Emad Helmi
  • 75
  • 5
  • I have a doubt, I saw kenlm documentation, however the training method in python is nowhere mentioned. How can we train our model? – Riken Shah Feb 23 '17 at 04:03
  • same problem. I think the respondents miss the point of @Emad Helmi's question. Why does a sentence drawn verbatim from the corpus return a bad negative number – Lcat May 01 '18 at 05:37

2 Answers2

3

The final negative number say, -9.585592 is the log probability of the sentence. Since it's the logarithm, you need to compute the 10 to the power of that number, which is around 2.60 x 10-10. Maybe this is the positive number you are looking for.

More info here

3

To get the corresponding score that is between 0 and 1:

import math
print(math.pow(10,model.score(<my sentence>)))
Wei JIANG
  • 71
  • 4