0

I ran pos_tag function on below text,it shows output with battery as 'RB'. As battery is noun, it should show as 'NN'.

nltk.pos_tag(nltk.word_tokenize('Camera picture quality was fair but speed was an issue and also battery life was not that good'))

Output:

[('Camera', 'NNP'), ('picture', 'NN'), ('quality', 'NN'), ('was', 'VBD'), ('fair', 'JJ'), ('but', 'CC'), ('speed', 'NN'), ('was', 'VBD'), ('an', 'DT'), ('issue', 'NN'), ('and', 'CC'), ('also', 'RB'), ('battery', 'RB'), ('life', 'NN'), ('was', 'VBD'), ('not', 'RB'), ('that', 'IN'), ('good', 'JJ')]

While if I POS tagged the same statement by this tagger http://cst.dk/online/pos_tagger/uk/ , it shows battery as 'NN' and gives following output:

Camera/NNP picture/NN quality/NN was/VBD fair/JJ but/CC speed/NN was/VBD an/DT issue/NN and/CC also/RB battery/NN life/NN was/VBD not/RB that/IN good/JJ

Edit:

With statement as :

"Camera picture quality was fair but speed was an issue but battery life was not that good"

the NLTK tagger gives following output:

[('Camera', 'NNP'), ('picture', 'NN'), ('quality', 'NN'), ('was', 'VBD'), ('fair', 'JJ'), ('but', 'CC'), ('speed', 'NN'), ('was', 'VBD'), ('an', 'DT'), ('issue', 'NN'), ('but', 'CC'), ('battery', 'NN'), ('life', 'NN'), ('was', 'VBD'), ('not', 'RB'), ('that', 'IN'), ('good', 'JJ')]

Please explain!

chammu
  • 1,275
  • 1
  • 18
  • 26
  • 1
    It's obviously using a different POS tagging model. These things are machine-learned, they make mistakes sometimes. – Fred Foo Feb 14 '14 at 17:56
  • so is there some kind of accuracy level guaranteed? – chammu Feb 14 '14 at 18:19
  • @larsmans can you please provide more insight into this? – Rajat Feb 17 '14 at 18:06
  • possible duplicate of [POS tagging - NLTK thinks noun is adjective](http://stackoverflow.com/questions/13529945/pos-tagging-nltk-thinks-noun-is-adjective) – alvas Feb 19 '14 at 10:50
  • @r20rock You'd have to inspect the model and potentially the training set to be sure. I'm not really familiar with the implementation of `pos_tag`; check the NLTK source code. – Fred Foo Feb 19 '14 at 11:31
  • see see http://stackoverflow.com/questions/30821188/python-ntlk-pos-tag-not-returnig-the-correct-pos – alvas Jun 13 '15 at 22:36

1 Answers1

1

It seems like the only difference is that cst.dk tagged battery as NN and NLTK tagged battery as RB (adverb).

>>> cstdk_output = "Camera/NNP picture/NN quality/NN was/VBD fair/JJ but/CC speed/NN was/VBD an/DT issue/NN and/CC also/RB battery/NN life/NN was/VBD not/RB that/IN good/JJ"
>>> cstdk_postags = [tuple(j for j in i.split('/')) for i in cstdk_output.split()]
>>> from nltk import pos_tag
>>> sent = [i for i,j in cstdk_postags]
>>> nltk_postags = pos_tag(sent)
>>> diff = [(i[0],i[1],j[1]) for i,j in zip(cstdk_postags, nltk_postags) if i[1] != j[1]]
>>> diff
[('battery', 'NN', 'RB')]

There is not much to explain. It's a statistical trained system using Maximum Entropy, see _POS_TAGGER in http://www.nltk.org/_modules/nltk/tag.html#pos_tag, so it is bound to make mistake. See other mistakes it makes, POS tagging - NLTK thinks noun is adjective

Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738