I have the following sentence:
She usually walks three miles a day
. There are several grammar elements here, like present simple
, personal pronouns
, numerals
, etc. Is there a software that can detect these?

- 101,079
- 60
- 333
- 488
-
This would be better suited to cross-validated and ask for algorithms rather than software... Then you can search for implementations of a particular algorithm. – hally9k Oct 22 '15 at 20:23
-
Simple grammatical analysis is typically performed by POS taggers. You can get fairly detailed subcategorization (not just *verb* but tagged for number, tense, etc) with most systems. For languages with real inflection systems, you probably want a morphological analyzer instead, or as well. – tripleee Oct 22 '15 at 20:40
-
@hally9k Why cross-validated? This isn't statistics. – tripleee Oct 22 '15 at 20:42
-
@tripleee You are more than likely correct, where would you place this? – hally9k Oct 22 '15 at 20:45
1 Answers
You want to find an English parser. There are many different techniques and tools for parsing. NLTK includes extensive capabilities for parsing (here). If you want something easy and ready-made, you can use the Stanford Parser which is also available as a Java program. Your example sentence is parsed as the following:
She/PRP usually/RB walks/VBZ three/CD miles/NNS a/DT day/NN
(ROOT
(S
(NP (PRP She))
(VP
(ADVP (RB usually))
(VBZ walks)
(NP
(NP (CD three) (NNS miles))
(NP (DT a) (NN day))))))
Universal dependencies
nsubj(walks-3, She-1)
advmod(walks-3, usually-2)
root(ROOT-0, walks-3)
nummod(miles-5, three-4)
dobj(walks-3, miles-5)
det(day-7, a-6)
dep(miles-5, day-7)
Universal dependencies, enhanced
nsubj(walks-3, She-1)
advmod(walks-3, usually-2)
root(ROOT-0, walks-3)
nummod(miles-5, three-4)
dobj(walks-3, miles-5)
det(day-7, a-6)
dep(miles-5, day-7)
This may look rather cryptic but actually it does contain all the information you are looking for. For example in walks/VBZ
, VBZ
means Verb, 3rd person singular present
. PRP
means Personal Pronoun
or CD
means Cardinal Number
. These are abbreviations used in the Penn Treebank. You can find most of these here.
The last part deals with dependencies. For instance advmod(walks-3, usually-2)
means the adverb usually
refers to the verb walks
etc.

- 2,617
- 4
- 28
- 43