I'm using SRILM's ngram-count command line utility in an attempt to calculate a trigram model for a subset of the Gutenberg corpus. The command line is:
-order 3 -kndiscount -text {$text} -lm {$lm} -gt2min 10 -gt3min 5 -vocab {$vocab} -unk
However, some of my lines are coming up as discounted to 0 (or at least that's why I believe is happening.
-5.018952 roaming
-4.189117 roar -0.2053203
-4.30369 roared 0 <-- discounted to zero?
This also occurs if I'm using -gt1min 0 or any other value and the minimum threshold is realized. How do I prevent this from happening? It's causing problems when I try to convert this to an n-gram based FST and observe input with one of these words in the input sentence.