What is the most fast and accurate POS Tagger in Python (with a commercial license)?

Question

Which POS tagger is fast and accurate and has a license that allows it to be used for commercial needs? For testing, I used Stanford POS which works well but it is slow and I have a license problem.

score 2 · Answer 1 · edited May 23 '17 at 11:53

You can use nltk.

>>> import nltk
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]

Explanation:

word_tokenize first correctly tokenizes a sentence into words. Also available is a sentence tokenizer.

Then, pos_tag tags an array of words into the Parts of Speech.

More information available here and here.

See this answer for a long and detailed list of POS Taggers in Python.

NLTK is not perfect. In fact, no model is perfect.

You may need to first run

>>> import nltk; nltk.download()

in order to load the tokenizer data.

score 1 · Answer 2 · answered Dec 27 '16 at 14:36

I've had some successful experience with a combination of nltk's Part of Speech tagging and textblob's. Both are open for the public (or at least have a decent public version available).

http://textanalysisonline.com/nltk-pos-tagging

https://textblob.readthedocs.io/en/dev/

What is the most fast and accurate POS Tagger in Python (with a commercial license)?

2 Answers2