0

In the following code, why does nltk think 'select' is an adjective and not a verb?

>>> import nltk
>>> t = nltk.tokenize.word_tokenize("select icon from icon")
>>> nltk.tag.pos_tag(t)
[('select', 'JJ'), ('icon', 'NN'), ('from', 'IN'), ('icon', 'NN')]
Kaleidophon
  • 589
  • 1
  • 5
  • 16
s.dutta
  • 63
  • 7

1 Answers1

1

I guess there is no easy answer, because the Tagger is trained with a statistical model (I found it being trained with a Back-off Trigram Markov Model on the Penn Treebank here).

I could imagine "select icon from icon" being a very rare occurrence in the training corpus (if it occurred at all), so looking at the first word, having no more contextual information except maybe for the beginning of a new sentence, it assigned JJ as the most likely tag.

If this is a huge problem for you, you can consider training your own tagger on a corpus where more of these kinds of sentences occur or enrich an old one using something like this.

Kaleidophon
  • 589
  • 1
  • 5
  • 16