I am looking for a way given an English text count verb phrases in it in past, present and future tenses. For now I am using NLTK, do a POS (Part-Of-Speech) tagging, and then count say 'VBD' to get past tenses. This is not accurate enough though, so I guess I need to go further and use chunking, then analyze VP-chunks for specific tense patterns. Is there anything existing that does that? Any further reading that might be helpful? The NLTK book is focused mostly on NP-chunks, and I can find quite few info on VP-chunks.
Asked
Active
Viewed 1.0k times
16
-
There's a flaw in your logic. If a chunker can detect NP, then it must be able to detect VP. – Tim McNamara Aug 09 '10 at 05:21
-
Of course but I am mostly interested in further VP analysis - how to make a difference between different tenses. – Michael Pliskin Aug 09 '10 at 10:54
2 Answers
10
Thee exact answer depends on which chunker you intend to use, but list comprehensions will take you a long way. This gets you the number of verb phrases using a non-existent chunker.
len([phrase for phrase in nltk.Chunker(sentence) if phrase[1] == 'VP'])
You can take a more fine-grained approach to detect numbers of tenses.

mobeets
- 450
- 3
- 13

Tim McNamara
- 18,019
- 4
- 52
- 83
-
Thanks for the pointer, that's what I am gonna use - my next question is whether there is something existing to detect tense patterns. For each VP I'd like to know what tense is it in. – Michael Pliskin Aug 09 '10 at 10:55
-
2I actually managed to solve my problem with this approach, so tagging this as accepted answer. The following article is really helpful: http://streamhacker.com/2009/02/23/chunk-extraction-with-nltk/ – Michael Pliskin Aug 16 '10 at 12:46
-
1
You can do this with either the Berkeley Parser or Stanford Parser. But I don't know if there's a Python interface available for either.

Jerry Stratton
- 3,287
- 1
- 22
- 30

ars
- 120,335
- 23
- 147
- 134
-
1Thanks a lot, this might be an option - however as I am heavily using NLTK already, it might be quite a lot of work to switch. Will look though. – Michael Pliskin Aug 09 '10 at 10:59
-
2There is an interface for the Stanford Parser in the NLTK. You can use it as follows: `tagger = nltk.tag.stanford.POSTagger('models/german-fast.tagger', 'stanford-postagger.jar')` You may have to encode the strings to UTF-8 first (at least for the German model). – Suzana Mar 21 '13 at 16:44