Negative results using kenlm

Question

I am new to the language modeling and a make a 3grams language model using kenlm(or this) from a large text file (~7gb.). I make a binary file from my language model and call it in python like this:

import kenlm
model = kenlm.LanguageModel(<my .klm file>)
model.score(<my sentence>)

and i get a negative number as the result.and when i change the sentence for scoring, the result remains negative but changes.I give it exactly one of the large text file sentences but it gives me a bad negative number(in comparison with a sentence that does not in the text file) I dont know what does negative result means and how can i convert it to positive and normal result to select the most correct sentece between some sentences.

I have a doubt, I saw kenlm documentation, however the training method in python is nowhere mentioned. How can we train our model? — Riken Shah, Feb 23 '17 at 04:03
same problem. I think the respondents miss the point of @Emad Helmi's question. Why does a sentence drawn verbatim from the corpus return a bad negative number — Lcat, May 01 '18 at 05:37

score 3 · Answer 1 · answered Feb 28 '17 at 08:25

The final negative number say, -9.585592 is the log probability of the sentence. Since it's the logarithm, you need to compute the 10 to the power of that number, which is around 2.60 x 10-10. Maybe this is the positive number you are looking for.

More info here

score 3 · Answer 2 · answered Apr 02 '18 at 03:25

3

To get the corresponding score that is between 0 and 1:

import math
print(math.pow(10,model.score(<my sentence>)))

answered Apr 02 '18 at 03:25

Wei JIANG

71
4

Negative results using kenlm

2 Answers2