25

Following several other posts, [e.g. Detect English verb tenses using NLTK , Identifying verb tenses in python, Python NLTK figure out tense ] I wrote the following code to determine tense of a sentence in Python using POS tagging:

from nltk import word_tokenize, pos_tag

def determine_tense_input(sentence):
    text = word_tokenize(sentence)
    tagged = pos_tag(text)

    tense = {}
    tense["future"] = len([word for word in tagged if word[1] == "MD"])
    tense["present"] = len([word for word in tagged if word[1] in ["VBP", "VBZ","VBG"]])
    tense["past"] = len([word for word in tagged if word[1] in ["VBD", "VBN"]]) 
    return(tense)

This returns a value for the usage of past/present/future verbs, which I typically then take the max value of as the tense of the sentence. The accuracy is moderately decent, but I am wondering if there is a better way of doing this.

For example, is there now by-chance a package written which is more dedicated to extracting the tense of a sentence? [note - 2 of the 3 stack-overflow posts are 4-years old, so things may have now changed]. Or alternatively, should I be using a different parser from within nltk to increase accuracy? If not, hope the above code may help someone else!

serv-inc
  • 35,772
  • 9
  • 166
  • 188
kyrenia
  • 5,431
  • 9
  • 63
  • 93
  • Maybe you can try to find a more fine-grained tagger. Either by training your own based on a tagged corpus, or by using something from Stanford, for ex. I find that, for some purposes (such as this one) the coarse tagging from nltk.pos_tag (or the available corpora in nltk_data) doesn't really help you a lot. Using a tagger with more distinct classes has helped me before in similar scenario's. This all depends on availability of usually quite domain specific annotated corpora though. – Igor May 04 '15 at 12:24
  • 1
    For a more accurate approach, you need to distinguish between primary and secondary tense. My answer to a similar question might help: http://stackoverflow.com/a/22146151/1011791 – Chthonic Project May 04 '15 at 16:35
  • @ChthonicProject - Thank you - I had not seen that post, and it does help point me in right direction – kyrenia May 05 '15 at 15:19
  • for your first condition what if i am saying "I/you could have called you/me" then this logic will end up having incorrect output – jiwitesh Apr 17 '19 at 16:00
  • Is there anything new on this after so many years? Or for finding the tense do we have to use POS tagging? – vaibhav jain Jan 21 '23 at 06:56

5 Answers5

5

You can strengthen your approach in various ways. You could think more about the grammar of English and add some more rules based on whatever you observe; or you could push the statistical approach, extract some more (relevant) features and throw the whole lot at a classifier. The NLTK gives you plenty of classifiers to play with, and they're well documented in the NLTK book.

You can have the best of both worlds: Hand-written rules can be in the form of features that are fed to the classifier, which will decide when it can rely on them.

alexis
  • 48,685
  • 16
  • 101
  • 161
3

As of http://dev.lexalytics.com/wiki/pmwiki.php?n=Main.POSTags, the tags mean

MD  Modal verb (can, could, may, must)
VB  Base verb (take)
VBC Future tense, conditional
VBD Past tense (took)
VBF Future tense
VBG Gerund, present participle (taking)
VBN Past participle (taken)
VBP Present tense (take)
VBZ Present 3rd person singular (takes)

so that your code would be

tense["future"] = len(word for word in tagged if word[1] in ["VBC", "VBF"])
bryant1410
  • 5,540
  • 4
  • 39
  • 40
serv-inc
  • 35,772
  • 9
  • 166
  • 188
2

You could use the Stanford Parser to get a dependency parse of the sentence. The root of the dependency parse will be the 'primary' verb that defines the sentence (I'm not too sure what the specific linguistic term is). You can then use the POS tag on this verb to find its tense, and use that.

viswajithiii
  • 449
  • 4
  • 8
  • 2
    You think? For "My dog has eaten my homework" you'll get the main verb `(VBZ has)` i.e. "present tense (with 3rd person inflection)". But perfect tense is in the past. Using the structure is a good idea, but it needs more analysis than you suggest. – alexis Nov 12 '18 at 12:19
1

This worked for me:

text = "He will have been doing his homework." 

tokenized = word_tokenize(text)
tagged = pos_tag(tokenized)

`grammar = r"""
Future_Perfect_Continuous: {<MD><VB><VBN><VBG>}
Future_Continuous:         {<MD><VB><VBG>}
Future_Perfect:            {<MD><VB><VBN>}
Past_Perfect_Continuous:   {<VBD><VBN><VBG>}
Present_Perfect_Continuous:{<VBP|VBZ><VBN><VBG>}
Future_Indefinite:         {<MD><VB>}
Past_Continuous:           {<VBD><VBG>}
Past_Perfect:              {<VBD><VBN>}
Present_Continuous:        {<VBZ|VBP><VBG>}
Present_Perfect:           {<VBZ|VBP><VBN>}
Past_Indefinite:           {<VBD>}
Present_Indefinite:        {<VBZ>|<VBP>}
"""`

The only thing is that you gotta deal with modal verbs, cause "could" or "may", for example, are treated as "will" in this case and give you the future group.

1

No, of course not. This is what I got so far (you might want to read nltk book grammar parsing section, too): I left only verb tags to simplify the task a little bit, then used nltk's RegexpParser.

def tense_detect(tagged_sentence):
    
verb_tags = ['MD','MDF',
             'BE','BEG','BEN','BED','BEDZ','BEZ','BEM','BER',
             'DO','DOD','DOZ',
             'HV','HVG','HVN','HVD','HVZ',
             'VB','VBG','VBN','VBD','VBZ',
             'SH',
             'TO',
             
             'JJ' # maybe?
             ]
    
verb_phrase = []
for item in tagged_sentence:
    if item[1] in verb_tags:
        verb_phrase.append(item)

grammar = r'''
        future perfect continuous passive:     {<MDF><HV><BEN><BEG><VBN|VBD>+}
        conditional perfect continuous passive:{<MD><HV><BEN><BEG><VBN|VBD>+}
        future continuous passive:             {<MDF><BE><BEG><VBN|VBD>+}   
        conditional continuous passive:        {<MD><BE><BEG><VBN|VBD>+}    
        future perfect continuous:             {<MDF><HV><BEN><VBG|HVG|BEG>+}   
        conditional perfect continuous:        {<MD><HV><BEN><VBG|HVG|BEG>+}
        past perfect continuous passive:       {<HVD><BEN><BEG><VBN|VBD>+}
        present perfect continuous passive:    {<HV|HVZ><BEN><BEG><VBN|VBD>+}
        future perfect passive:                {<MDF><HV><BEN><VBN|VBD>+}   
        conditional perfect passive:           {<MD><HV><BEN><VBN|VBD>+}    
        future continuous:                     {<MDF><BE><VBG|HVG|BEG>+ }   
        conditional continuous:                {<MD><BE><VBG|HVG|BEG>+  }   
        future indefinite passive:             {<MDF><BE><VBN|VBD>+ }
        conditional indefinite passive:        {<MD><BE><VBN|VBD>+  }
        future perfect:                        {<MDF><HV><HVN|BEN|VBN|VBD>+ }   
        conditional perfect:                   {<MD><HV><HVN|BEN|VBN|VBD>+  }   
        past continuous passive:               {<BED|BEDZ><BEG><VBN|VBD>+}  
        past perfect continuous:               {<HVD><BEN><HVG|BEG|VBG>+}   
        past perfect passive:                  {<HVD><BEN><VBN|VBD>+}
        present continuous passive:            {<BEM|BER|BEZ><BEG><VBN|VBD>+}   
        present perfect continuous:            {<HV|HVZ><BEN><VBG|BEG|HVG>+}    
        present perfect passive:               {<HV|HVZ><BEN><VBN|VBD>+}
        future indefinite:                     {<MDF><BE|DO|VB|HV>+ }       
        conditional indefinite:                {<MD><BE|DO|VB|HV>+  }   
        past continuous:                       {<BED|BEDZ><VBG|HVG|BEG>+}           
        past perfect:                          {<HVD><BEN|VBN|HVD|HVN>+}
        past indefinite passive:               {<BED|BEDZ><VBN|VBD>+}   
        present indefinite passive:            {<BEM|BER|BEZ><VBN|VBD>+}            
        present continuous:                    {<BEM|BER|BEZ><BEG|VBG|HVG>+}            
        present perfect:                       {<HV|HVZ><BEN|HVD|VBN|VBD>+  }       
        past indefinite:                       {<DOD><VB|HV|DO>|<BEDZ|BED|HVD|VBN|VBD>+}        
        infinitive:                            {<TO><BE|HV|VB>+}
        present indefinite:                    {<DO|DOZ><DO|HV|VB>+|<DO|HV|VB|BEZ|DOZ|BER|HVZ|BEM|VBZ>+}    
        '''

cp = nltk.RegexpParser(grammar)
result = cp.parse(verb_phrase)
display(result)    
                      
tenses_set = set()
for node in result:
    if type(node) is nltk.tree.Tree:
        tenses_set.add(node.label())
return result, tenses_set

This works just OK. Even with odd complex sentences. The big problem are the causatives, like "I have my car washed every day". Removing everything but the verbs results in " have washed", which gives Present Perfect. You gotta tweak it anyway. I've just fixed the computer and don't have nltk installed yet to show the outcome. Will try to do it tomorrow.

  • Hi. This is a great answer, but the first sentence "No, of course not." is completely out of context. – Stef Jan 21 '23 at 09:55