2

I am using NLTK POS tagger as below

sent1='get me now'
sent2='run fast'
tags=pos_tag(word_tokenize(sent2))
print tags
[('run', 'NN'), ('fast', 'VBD')]

I find similar posts NLTK Thinks that Imperatives are Nouns which suggest to add the word to a dictionary as a verb. Problem is I have too many such unknown words. But one clue I have, they always appear at the start of a phrase.

Eg: 'Download now', 'Book it now', 'Sign up'

How can i correctly assist the NLTK to produce correct result

Community
  • 1
  • 1
aman
  • 1,875
  • 4
  • 18
  • 27
  • possible duplicate of [NLTK Thinks that Imperatives are Nouns](http://stackoverflow.com/questions/9406093/nltk-thinks-that-imperatives-are-nouns) – alvas Aug 26 '15 at 11:48

2 Answers2

3

There are other third-party models that you can load in NLTK. Take a look at Python NLTK pos_tag not returning the correct part-of-speech tag


To answer the question with some hacks, you can trick the POS tagger by adding a pronoun so that the verb gets a subject, e.g.

>>> from nltk import pos_tag
>>> sent1 = 'get me now'.split()
>>> sent2 = 'run fast'.split()
>>> pos_tag(['He'] + sent1)
[('He', 'PRP'), ('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]
>>> pos_tag(['He'] + sent1)[1:]
[('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]

To functionalize the answer:

>>> from nltk import pos_tag
>>> sent1 = 'get me now'.split()
>>> sent2 = 'run fast'.split()
>>> def imperative_pos_tag(sent):
...     return pos_tag(['He']+sent)[1:]
... 
>>> imperative_pos_tag(sent1)
[('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]
>>> imperative_pos_tag(sent2)
[('run', 'VBP'), ('fast', 'RB')]

If you want all verbs in your imperative to receive base form VB tag:

>>> from nltk import pos_tag
>>> sent1 = 'get me now'.split()
>>> sent2 = 'run fast'.split()
>>> def imperative_pos_tag(sent):
...     return [(word, tag[:2]) if tag.startswith('VB') else (word,tag) for word, tag in pos_tag(['He']+sent)[1:]]
... 
>>> imperative_pos_tag(sent1)
[('get', 'VB'), ('me', 'PRP'), ('now', 'RB')]
>>> imperative_pos_tag(sent2)
[('run', 'VB'), ('fast', 'RB')]
Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738
  • It gives me wrong result eg: 'run' should tagged as VB(verb base form) not VBP again in case of 'get' it should be 'VB' not 'VBD'. I tried using Stanford POS tagger but its terribly slow in NLTK, can you suggest other 3rd party tagger or someway around to modify and use pos_tag from NLTK – aman Aug 27 '15 at 07:27
  • For third party taggers, see http://stackoverflow.com/questions/30821188/python-nltk-pos-tag-not-returning-the-correct-part-of-speech-tag. And since you're only interested in VB instead of the other tags when it comes to imperative, you can cut out the last character in the VB* tags. – alvas Aug 27 '15 at 10:56
  • Wouldn't it be better to use `'you'` instead of `'he'`? – Bill Aug 18 '23 at 06:00
2

Found this new library called spaCy here https://spacy.io/usage/linguistic-features#pos-tagging and it works good,

import spacy
nlp = spacy.load("en_core_web_sm")
text = ("run fast")
doc = nlp(text)
verbs = [(token, token.tag_) for token in doc]
print(verbs)

Output:

[('run', 'VB'), ('fast', 'RB')]

Installation guides: https://spacy.io/usage