Package to generate n-gram language models with smoothing? (Alternatives to NLTK)

Question

I'd like to find some type of package or module (preferably Python or Perl, but others would do) that automatically generate n-gram probabilities from an input text, and can automatically apply one or more smoothing algorithms as well.

That is, I am looking for something like the NLTK NgramModel class. I can't use this for my purposes because there are some bugs with the smoothing functions which make it choke when you ask for the probability of a word it hasn't seen before.

I've read through the dev forums for NLTK and as of now there seems to be no progress on this.

Any alternatives out there?

Hi there! How did you calculate the perplexity? Which toolkit or package was useful for you? I am stuck with the same problem now :( Not able to use nltk to calculate the perplexity. — Ana_Sam, Oct 20 '15 at 22:40

score 6 · Answer 1 · edited May 01 '19 at 03:05

6

Looks like I answered my own question, so I'll mention what I've found here in case others are looking for it.

There are two toolkits that I've found:

They appear to have very similar functionality. Both include a variety of smoothing functions.

edited May 01 '19 at 03:05

Laurel

5,965
14
31
57

answered Jul 14 '11 at 18:30

Alan H.

263
3
8

score 0 · Answer 2 · answered Apr 06 '15 at 15:16

0

NLTK also provides an ngram model package, which has smoothing, backoff, etc.

answered Apr 06 '15 at 15:16

Adam_G

7,337
20
86
148

score -2 · Answer 3 · edited May 01 '19 at 03:04

-2

I guess another answer would be to either download the datasets Google provides if that data is suitable for your application, or maybe use their online viewer.

edited May 01 '19 at 03:04

Laurel

5,965
14
31
57

answered Aug 28 '11 at 12:28

snim2

4,004
27
44

Package to generate n-gram language models with smoothing? (Alternatives to NLTK)

3 Answers3