0

I'm using SRILM's ngram-count command line utility in an attempt to calculate a trigram model for a subset of the Gutenberg corpus. The command line is:

 -order 3 -kndiscount -text {$text} -lm {$lm} -gt2min 10 -gt3min 5 -vocab {$vocab} -unk

However, some of my lines are coming up as discounted to 0 (or at least that's why I believe is happening.

-5.018952   roaming
-4.189117   roar    -0.2053203
-4.30369    roared  0    <-- discounted to zero?

This also occurs if I'm using -gt1min 0 or any other value and the minimum threshold is realized. How do I prevent this from happening? It's causing problems when I try to convert this to an n-gram based FST and observe input with one of these words in the input sentence.

saigafreak
  • 405
  • 6
  • 14

1 Answers1

0

According to this tutorial (page 17), the number you are referring to is the backoff weight. For more information, you could read this.

Bjerva
  • 213
  • 2
  • 5