Python NLTK PoS Tag inaccurate

Question

I've been trying to improve the POS tagger on the NLTK for a few days, but I cannot figure it out. Right now, the default tagger is really inaccurate and tags most words as 'NN'. How can I improve the tagger to make it more accurate? I've already looked up training the tagger, but I can't get it to work.

Does anybody have a simple method for this? thanks a lot.

score 1 · Answer 1 · answered Feb 03 '17 at 21:36

Are you doing it one word at a time or in a large corpus? Usually POS tagging algorithms use the probability that the word is a tag type e.g "NN" but they also use the surrounding sentence context to predict so the more words, the more likely it is to be accurate.

You can also try with varying Unigram, bigram, trigram, etc tagging to try to get higher accuracy at the cost of performance. You can read about doing that here: http://www.nltk.org/book/ch05.html

Python NLTK PoS Tag inaccurate

1 Answers1