Language Modelling toolkit

Question

I would like to build a language model for a text corpus. Are there good out-of-the-box toolkits which will alleviate my task? The only toolkit I know off is the Statistical Language Modelling(SLM) Toolkit by CMU.

Regards,

score 2 · Answer 1 · answered Jul 21 '10 at 13:55

2

NLTK is very powerful, though I've never used it.

answered Jul 21 '10 at 13:55

Ned Batchelder

364,293
75
561
662

+1 Natural Language Processing Toolkit is your best choice. Download from nltk.org or buy the book from the Oreilly site. It is close to a must-have. IMO. – jim mcnamara Jul 21 '10 at 14:06
I have used NLTK in the past but language models using NLTK is something which I never knew about. – Dexter Jul 22 '10 at 12:10
http://nltk.googlecode.com/svn/trunk/doc/api/nltk.model.ngram.NgramModel-class.html I finally got hold of the class but there seems to be no documentation for the same ! – Dexter Jul 22 '10 at 19:12
1

I can safely say NLTK is not really powerful after all. Reason: http://code.google.com/p/nltk/issues/detail?id=232 To be honest, it is absolutely disappointing to try doing something which is a "basic" model in machine learning and not just NOT implemented in NLTK but very few toolkits in popular languages like Java/Python around. – Dexter Jul 23 '10 at 18:03

score 1 · Answer 2 · answered Apr 18 '16 at 16:19

1

The SRILM toolkit is very useful.

http://www.speech.sri.com/projects/srilm/

answered Apr 18 '16 at 16:19

Aaron

2,354
1
17
25

score 0 · Answer 3 · answered Jul 21 '16 at 06:54

0

KenLM is also worth trying. It's fast and uses good default settings. In contrast to SRILM, it offers less options for configuration.

answered Jul 21 '16 at 06:54

Stefanus

1,619
3
12
23

Language Modelling toolkit

3 Answers3