Is something wrong with the way that I'm generating this NLTK grammar?

Question

I wrote this simple program with NLTK that's just supposed to print out the syntax tree. However, it prints nothing out even though the RecursiveDescentParser is being created. What's my problem? Am I defining the grammar incorrectly? Is something wrong with the way that I'm trying to iterate through the parser? Thank you in advance.

import nltk

'''The price of peace is rising.'''

grammar = nltk.CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "is" | "rising"
  NP -> Det N | Det N PP
  Det -> "the" | "of"
  N -> "price" | "peace"
  P -> "in" | "on" | "by" | "with"
  """)

sentence = "the price of peace is rising"
wordArray = sentence.split()

print(wordArray)

parser = nltk.RecursiveDescentParser(grammar)

for tree in parser.parse(wordArray):
    print(tree)

You don't have a valid sentence `the price of peace is rising` - `Det N Det N V V` is not defined by the grammar, `the price with the peace is the price` is a valid sentence. — AChampion, Mar 30 '17 at 03:03
See also: http://stackoverflow.com/questions/42966067/nltk-chart-parser-is-not-printing/42966837#42966837 — alvas, Mar 30 '17 at 03:47

alvas · Answer 1 · 2017-03-30T03:46:58.507

Firstly, always start writing grammar in bite-size.

Let's start with an easy sentence Peace is rising.

We want the structure S -> NP VP, where:

VP is an intransitive verb phrase, in this particular case, is rising comes with an auxiliary is and rise comes with the -ing progressive inflection.
NP is simply a single noun.

[code]:

import nltk

your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V 
V -> "rising"
AUX -> "is"
NP -> N
N -> "peace"
""")

parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "peace is rising".split()

for tree in parser.parse(sentence):
    print (list(tree))

[out]:

[Tree('NP', [Tree('N', ['peace'])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]

Now to add the determiner to the NP with NP -> DT NP | N:

import nltk

your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V 
V -> "rising"
AUX -> "is"
NP -> N | DT NP  
N -> "peace" | "price" 
DT -> "the"
""")

parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "the price is rising".split()

for tree in parser.parse(sentence):
    print (list(tree))

[out]:

[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('N', ['price'])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]

Finally, we can simply add the PP construction within the NP, with NP -> NP PP and PP -> P NP:

import nltk

your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V 
V -> "rising"
AUX -> "is"
NP -> N | DT NP | NP PP  
N -> "peace" | "price" 
DT -> "the"
PP -> P NP
P -> "of"
""")

parser = nltk.RecursiveDescentParser(your_grammar)
sentence = "the price of peace is rising".split()

for tree in parser.parse(sentence):
    print (list(tree))

which gives us the best possible parse in the top results.

[out]:

[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('NP', [Tree('N', ['price'])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]

But it also comes with some nasty recursive loop errors, that looks something like this:

  File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 158, in __getitem__
    return self[index[0]][index[1:]]
  File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 156, in __getitem__
    return self[index[0]]
  File "/usr/local/lib/python3.5/site-packages/nltk/tree.py", line 150, in __getitem__
    if isinstance(index, (int, slice)):
RecursionError: maximum recursion depth exceeded in __instancecheck__

It's because the nltk.RecursiveDescentParser tries to look for a parse recursively since NP -> NP PP and PP -> P NP rules can infinitely recur. If you would like to know why, try asking that as a separate question on StackOverflow ;P

One easy solution is to use try-except:

try:
    for tree in parser.parse(sentence):
        print (list(tree))
except RecursionError:
    exit()

But that's ugly! Instead, you could use a ChartParser:

import nltk

your_grammar = nltk.CFG.fromstring("""
S -> NP VP
VP -> AUX V 
V -> "rising"
AUX -> "is"
NP -> N | DT NP | NP PP  
N -> "peace" | "price" 
DT -> "the"
PP -> P NP
P -> "of"
""")

parser = nltk.ChartParser(your_grammar)
sentence = "the price of peace is rising".split()

for tree in parser.parse(sentence):
    print (list(tree))

[out]:

[Tree('NP', [Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('N', ['price'])])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]
[Tree('NP', [Tree('DT', ['the']), Tree('NP', [Tree('NP', [Tree('N', ['price'])]), Tree('PP', [Tree('P', ['of']), Tree('NP', [Tree('N', ['peace'])])])])]), Tree('VP', [Tree('AUX', ['is']), Tree('V', ['rising'])])]

Is something wrong with the way that I'm generating this NLTK grammar?

1 Answers1