POS tagging - NLTK thinks noun is adjective

Question

In the following code, why does nltk think 'fish' is an adjective and not a noun?

>>> import nltk
>>> s = "a woman needs a man like a fish needs a bicycle"
>>> nltk.pos_tag(s.split())
[('a', 'DT'), ('woman', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('man', 'NN'), ('like', 'IN'), ('a', 'DT'), ('fish', 'JJ'), ('needs', 'NNS'), ('a', 'DT'), ('bicycle', 'NN')]

see http://stackoverflow.com/questions/30821188/python-ntlk-pos-tag-not-returnig-the-correct-pos — alvas, Jun 13 '15 at 22:35

score 4 · Answer 1 · answered Dec 12 '12 at 09:27

I am not sure what is the workaround but you can check the source here https://nltk.googlecode.com/svn/trunk/nltk/nltk/tag/

Meanwhile I tried your sentence with little different approach.

>>> s = "a woman needs a man. A fish needs a bicycle"
>>> nltk.pos_tag(s.split())
[('a', 'DT'), ('woman', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('man.', NP'), ('A','NNP'),   ('fish', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('bicycle', 'NN')]

which resulted in fish as "NN".

I think it's because short sentences generally gets higher accuracy. — alvas, Jan 17 '13 at 10:15

score 4 · Answer 2 · answered Jan 23 '13 at 17:43

4

If you used a Lookup Tagger as described in the NLTK book, chapter 5 (for example using WordNet as lookup reference) first, your tagger would already "know" that fish cannot be an adjective. For all words with several possible POS Tags you could then use a statistical tagger as a backoff tagger.

answered Jan 23 '13 at 17:43

Suzana

4,251
2
28
52

Can you give an example of the statistical tagger you refer to at the end of your answer? – Private Jul 06 '15 at 11:20
Most POS taggers in the NLTK make use of statistics of word / feature combinations. For example, [TNT](http://www.nltk.org/api/nltk.tag.html#nltk.tag.tnt.TnT) and [Naive Bayes](http://www.nltk.org/api/nltk.classify.html#nltk.classify.naivebayes.NaiveBayesClassifier). – Suzana Jul 08 '15 at 14:17

score 3 · Answer 3 · answered Jan 17 '13 at 10:28

It's because you want a woman needs a man like a fish needs a bicycle to get POS tags for such a "parse":

[ [[a woman] needs [a man]] like [[a fish] needs [a bicycle]] ]

but instead the NLTK default pos tagger isn't smart enough and gave you POS tag for such a parse:

[ [[a woman] needs [a man]] like [a fish needs] [a bicycle] ]

Aravind Asok · Answer 4 · 2014-04-17T09:11:06.517

It depends on how the POS tagger is given the input. For example for the sentence: "a woman needs a man like a fish needs a bicycle"

If you use the default nltk word tokenizer and a regex tokenizer, the values will be different.

import nltk 
from nltk.tokenize import RegexpTokenizer

TOKENIZER = RegexpTokenizer('(?u)\W+|\$[\d\.]+|\S+')

s = "a woman needs a man like a fish needs a bicycle"

regex_tokenize = TOKENIZER.tokenize(s)
default_tokenize = nltk.word_tokenize(s)

regex_tag = nltk.pos_tag(regex_tokenize)
default_tag = nltk.pos_tag(default_tokenize)

print regex_tag
print "\n"
print default_tag

The output is as follows:

  Regex Tokenizer: 

[('a', 'DT'), (' ', 'NN'), ('woman', 'NN'), (' ', ':'), ('needs', 'NNS'), (' ', 'VBP'), ('a', 'DT'), (' ', 'NN'), ('man', 'NN'), (' ', ':'), ('like', 'IN'), (' ', 'NN'), ('a', 'DT'), (' ', 'NN'), ('fish', 'NN'), (' ', ':'), ('needs', 'VBZ'), (' ', ':'), ('a', 'DT'), (' ', 'NN'), ('bicycle', 'NN')]

 Default Tokenizer: 

[('a', 'DT'), ('woman', 'NN'), ('needs', 'VBZ'), ('a', 'DT'), ('man', 'NN'), ('like', 'IN'), ('a', 'DT'), ('fish', 'JJ'), ('needs', 'NNS'), ('a', 'DT'), ('bicycle', 'NN')]

In Regex Tokenizer fish is a noun while in the default tokenizer fish is an adjective. According to the tokenizer used, the parsing differs resulting in different parse tree structure.

score 2 · Answer 5 · answered Mar 16 '15 at 23:37

If you use the Stanford POS tagger (3.5.1) then the phrase is correctly tagged:

from nltk.tag.stanford import POSTagger
st = POSTagger("/.../stanford-postagger-full-2015-01-30/models/english-left3words-distsim.tagger",
               "/.../stanford-postagger-full-2015-01-30/stanford-postagger.jar")
st.tag("a woman needs a man like a fish needs a bicycle".split())

yields:

[('a', 'DT'),
 ('woman', 'NN'),
 ('needs', 'VBZ'),
 ('a', 'DT'),
 ('man', 'NN'),
 ('like', 'IN'),
 ('a', 'DT'),
 ('fish', 'NN'),
 ('needs', 'VBZ'),
 ('a', 'DT'),
 ('bicycle', 'NN')]

POS tagging - NLTK thinks noun is adjective

5 Answers5

Linked