Which POS tagger is fast and accurate and has a license that allows it to be used for commercial needs? For testing, I used Stanford POS which works well but it is slow and I have a license problem.
Asked
Active
Viewed 5,480 times
2 Answers
2
You can use nltk.
>>> import nltk
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
Explanation:
word_tokenize
first correctly tokenizes a sentence into words. Also available is a sentence tokenizer.
Then, pos_tag
tags an array of words into the Parts of Speech.
More information available here and here.
See this answer for a long and detailed list of POS Taggers in Python.
NLTK is not perfect. In fact, no model is perfect.
You may need to first run
>>> import nltk; nltk.download()
in order to load the tokenizer data.

Community
- 1
- 1

noɥʇʎԀʎzɐɹƆ
- 9,967
- 2
- 50
- 67
1
I've had some successful experience with a combination of nltk's Part of Speech tagging and textblob's. Both are open for the public (or at least have a decent public version available).

Laughing Horse
- 158
- 10