1

The word 'red' is recognised as a verb. I believe it's because it thinks it is, following the pattern. In the pattern, a word with an '-ed' suffix is a verb...or something like that.

How can I make exceptions or fix this issue. It might occur with other words later.

def LanguageTokenize(read):
    read = word_tokenize(read)
    read = nltk.pos_tag(read)
    return read

>>> LanguageTokenize('the red cat')
 *returns [('the', 'DT'), ('red', 'VBN'), ('cat', 'NN')]
alexis
  • 48,685
  • 16
  • 101
  • 161
deepadmax
  • 73
  • 2
  • 7
  • Welcome to Natural Language Processing, where things never work 100% they way you think they should! I wouldn't spend too much time trying to fix artificial sentences (or phrases) like that. Test the tools on real-world texts and see how they perform there. If you're not happy, you probably need to retrain. If you start defining exceptions for what you think is a corner case, you will never get to an end... – lenz Sep 14 '15 at 20:27
  • SpaCy.io seems to be able to recognise it, I can't use that though because I don't have Linux. If it's not too obvious, you're not being much help. – deepadmax Sep 14 '15 at 22:13
  • 3
    Make sure to have a look at this [question](https://stackoverflow.com/questions/30821188/python-nltk-pos-tag-not-returning-the-correct-part-of-speech-tag) – b3000 Sep 15 '15 at 08:48
  • 2
    @DeePad On StackOverflow, comments are not answers--they do not have to be that much help, and can just be a person's observations. So relax...if we jumped on people for everything, we might jump on them for tagging their posts "cat" and "red", eh? – HostileFork says dont trust SE Sep 15 '15 at 20:35
  • Okay. And I couldn't come up with any other tags that did not require me to have a high Reputation Score to create new ones. Sorry... – deepadmax Sep 16 '15 at 18:52
  • 1
    What @Lenz said: Don't worry about piecemeal "improvements", you have more important things to do and language is endless. That said, the NLTK's POS tagger has about [twice the error rate](http://spacy.io/blog/part-of-speech-POS-tagger-in-python/) of some other free tools. So if you _need_ accuracy it's worth your trouble to look for something else-- and still you'll get errors every other sentence. – alexis Sep 16 '15 at 22:08
  • SpaCy.io is much better but errors occur when installing it or installing things I need to install it. Perhaps you might have a link to somewhere for how to set it up properly and make sure I encounter no errors during installation? – deepadmax Sep 17 '15 at 23:35
  • @DeePad, you should start a new question for that, providing more details on the errors you get. If it's about installation only, you should probably post it on [Super User](http://superuser.com/) rather than here. – lenz Sep 18 '15 at 20:07
  • The `red -> VBN` should disappear once this is complete: https://github.com/nltk/nltk/issues/1122 – alvas Sep 19 '15 at 10:10

0 Answers0