I'd like to find some type of package or module (preferably Python or Perl, but others would do) that automatically generate n-gram probabilities from an input text, and can automatically apply one or more smoothing algorithms as well.
That is, I am looking for something like the NLTK NgramModel
class. I can't use this for my purposes because there are some bugs with the smoothing functions which make it choke when you ask for the probability of a word it hasn't seen before.
I've read through the dev forums for NLTK and as of now there seems to be no progress on this.
Any alternatives out there?