Firstly, always start writing grammar in bite-size.
Let's start with an easy sentence Peace is rising
.
We want the structure S -> NP VP
, where:
VP is an intransitive verb phrase, in this particular case, is rising
comes with an auxiliary is
and rise
comes with the -ing
progressive inflection.
NP is simply a single noun.
[code]:
import nltk
your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V
V -> "rising"
AUX -> "is"
NP -> N
N -> "peace"
""")
parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "peace is rising".split()
for tree in parser.parse(sentence):
print (list(tree))
[out]:
[Tree('NP', [Tree('N', ['peace'])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]
Now to add the determiner to the NP with NP -> DT NP | N
:
import nltk
your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V
V -> "rising"
AUX -> "is"
NP -> N | DT NP
N -> "peace" | "price"
DT -> "the"
""")
parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "the price is rising".split()
for tree in parser.parse(sentence):
print (list(tree))
[out]:
[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('N', ['price'])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]
Finally, we can simply add the PP construction within the NP, with NP -> NP PP
and PP -> P NP
:
import nltk
your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V
V -> "rising"
AUX -> "is"
NP -> N | DT NP | NP PP
N -> "peace" | "price"
DT -> "the"
PP -> P NP
P -> "of"
""")
parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "the price of peace is rising".split()
for tree in parser.parse(sentence):
print (list(tree))
which gives us the best possible parse in the top results.
[out]:
[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('NP', [Tree('N', ['price'])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]
But it also comes with some nasty recursive loop errors, that looks something like this:
File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 158, in __getitem__
return self[index[0]][index[1:]]
File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 156, in __getitem__
return self[index[0]]
File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 150, in __getitem__
if isinstance(index, (int, slice)):
RecursionError: maximum recursion depth exceeded in __instancecheck__
It's because the nltk.RecursiveDescentParser
tries to look for a parse recursively since NP -> NP PP
and PP -> P NP
rules can infinitely recur. If you would like to know why, try asking that as a separate question on StackOverflow ;P
One easy solution is to use try-except
:
try:
for tree in parser.parse(sentence):
print (list(tree))
except RecursionError:
exit()
But that's ugly! Instead, you could use a ChartParser
:
import nltk
your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V
V -> "rising"
AUX -> "is"
NP -> N | DT NP | NP PP
N -> "peace" | "price"
DT -> "the"
PP -> P NP
P -> "of"
""")
parser = nltk.ChartParser(your_grammar)
sentence = "the price of peace is rising".split()
for tree in parser.parse(sentence):
print (list(tree))
[out]:
[Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('N', ['price'])])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]
[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('NP', [Tree('N', ['price'])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]