2

So i'm trying to build a parser with PLY, but it doesn't work. It gives me errors that i can't seem to solve. I extracted a small part of the parser that i put in another file, for testing purpose : it doesn't work either.

Here is the simplified problem that i can't seem to solve :

Input example

#Trying to parse properties following the patter X1[some text]X2[other text]
#Simplified input used
GM[1]US[hello]

From my understanding the parser reads the first property (GM[1]) and triggers an error on the second one. He is taking the shortest possible reduction (which my rules indeed allow) and reads a single property. Then, there is no more rules to follow for the second one, which triggers the error.

So the problem would be : can you help me write my rules properly?

parser.py

#For simplicity, i only give you the rules, and removed operations on the tree.

def p_properties(self,p):
    "properties : property property_list"

def p_property_list(self,p):
    "property_list : "
    "               | properties"

def p_property(self,p):
    "property : PROPERTY_NAME single_property_content"

def p_single_property_content(self,p):
    "single_property_content : OBRACKET TEXT CBRACKET"

The starting rule is "properties". I would expect the input property (GM[1]) to match the rule property, and the second to match property_list. But i think that the first rule matches property AND property_list (as the empty production in property_list), which reduces to properties. Then, there is no more rules available for the second property to be read.

Here is the output of the parser:

State  : 0
Stack  : . LexToken(PROPERTY_NAME,'GM',1,0)
Action : Shift and goto state 2

State  : 2
Stack  : PROPERTY_NAME . LexToken(OBRACKET,'[',1,2)
Action : Shift and goto state 4

State  : 4
Stack  : PROPERTY_NAME OBRACKET . LexToken(TEXT,'1',1,3)
Action : Shift and goto state 7

State  : 7
Stack  : PROPERTY_NAME OBRACKET TEXT . LexToken(CBRACKET,']',1,4)
Action : Shift and goto state 8

State  : 8
Defaulted state 8: Reduce using 4
Stack  : PROPERTY_NAME OBRACKET TEXT CBRACKET . None
Action : Reduce rule [single_property_content -> OBRACKET TEXT CBRACKET] with ['[','1',']'] and goto state 5
Result : <tuple @ 0x7fea37eb4588> (('single_property_content', '1'))

State  : 5
Defaulted state 5: Reduce using 3
Stack  : PROPERTY_NAME single_property_content . None
Action : Reduce rule [property -> PROPERTY_NAME single_property_content] with ['GM',<tuple @ 0x7fea37eb4588>] and goto state 3
Result : <tuple @ 0x7fea3964f798> (('property', 'GM', ('single_property_con ...)

State  : 3
Defaulted state 3: Reduce using 2
Stack  : property . None
Action : Reduce rule [property_list -> <empty>] with [] and goto state 6
Result : <NoneType @ 0xa3f020> (None)

State  : 6
Defaulted state 6: Reduce using 1
Stack  : property property_list . None
Action : Reduce rule [properties -> property property_list] with [<tuple @ 0x7fea3964f798>,None] and goto state 1
Result : <tuple @ 0x7fea3964f678> (('properties', ('property', 'GM', ('sing ...)

State  : 1
Stack  : properties . LexToken(PROPERTY_NAME,'US',1,5)
ERROR: Error  : properties . LexToken(PROPERTY_NAME,'US',1,5)
Error while parsing : LexToken(PROPERTY_NAME,'US',1,5)

State  : 1
Stack  : properties . error
ERROR: Error  : properties . error

State  : 0
Stack  : . error
ERROR: Error  : . error

State  : 0
Stack  : . LexToken(OBRACKET,'[',1,7)
ERROR: Error  : . LexToken(OBRACKET,'[',1,7)

State  : 0
Stack  : . LexToken(TEXT,'hello',1,8)
ERROR: Error  : . LexToken(TEXT,'hello',1,8)

State  : 0
Stack  : . LexToken(CBRACKET,']',1,13)
ERROR: Error  : . LexToken(CBRACKET,']',1,13)

State  : 0
Stack  : . $end
ERROR: Error  : . $end
None

I've modified these rules in 15 different ways (right recursion, left), to no avail. Can you guys tell me what i'm doing wrong?

Edit After some trouble when implementing the solution (only worked when i parsed 2 elements but parsing more created errors), i managed to make it work, thanks to @rici's help.

Solution

def p_properties(self,p):
    """
    properties : property_list property
    """
    p[0] = ('properties',p[1], p[2])


def p_property_list(self,p):
    """
    property_list : property_list property
                    | 
    """

    if(len(p)==3):
        p[0] = ('property_list',p[1],p[2])

def p_property(self,p):
    """
    property : PROPERTY_NAME property_content_list
    """
    p[0] = ('property', p[1], p[2])

Input

GM[1]AP[hello]AW[world]C[1.this is a comment]C[2.this is a comment]C[3.this is a comment]
TM[50]WR[alalala]GM[will it fail?]GM[should succeed]

Result

('properties', ('property_list', ('property_list', ('property_list', 
('property_list', ('property_list', ('property_list', 
('property_list',('property_list', ('property_list', None, 
('property', 'GM', ('property_content', '1'))),
('property', 'AP', ('property_content','hello'))),
('property', 'AW', ('property_content', 'world'))), 
('property', 'C', ('property_content', '1.this is a comment'))), 
('property', 'C', ('property_content', '2.this is a comment'))), 
('property', 'C', ('property_content', '3.this is a comment'))), 
('property', 'TM', ('property_content', '50'))), ('property', 'WR', 
('property_content', 'alalala'))), ('property', 'GM', 
('property_content','will it fail?'))),
('property', 'GM', ('property_content', 'should succeed')))

Thank you for your help.

Loïc.

Loïc N.
  • 353
  • 3
  • 17
  • Inasmuch as StackOverflow's goal is to build a Q&A knowledgebase, the primary purpose of a question title is to let someone else determine whether that question's answer is going to be helpful to them. That is to say, it shouldn't be written to help people decide if they can help you -- first and foremost, it should be written for people having problems with Ply in the future to quickly decide if the answer to your question is likely to be helpful to them. – Charles Duffy Sep 11 '17 at 22:40
  • I'm pretty sure "Ply: Parsing error (simple issue)" could be improved towards that end; right now it doesn't help anyone distinguish whether their specific parsing error is the one this question's answers address. – Charles Duffy Sep 11 '17 at 22:42
  • Would, say, *Ply rule unexpectedly matching empty set of tokens* be an accurate (and more helpfully specific) title? – Charles Duffy Sep 11 '17 at 22:46
  • Hi Charles. I modified the title, as suggested. – Loïc N. Sep 11 '17 at 22:57
  • BTW, something can still be a docstring without triple-quotes. – Charles Duffy Sep 11 '17 at 23:17

1 Answers1

2

The productions for a Ply parser are contained in the docstring, which is the first string literal (if any) at the beginning of the function.

The docstring for p_property_list is, therefore, property_list:. (You can confirm this by looking at p_property_list.__doc__.) So that's all that Ply sees. Use a single multiline string literal if you want to provide multiple productions to a single function, which is why you will usually see raw string literals as docstrings:

def p_property_list(self, p):
    """property_list:
                    | property_list property
    """
    # ...

However, it is quite common to use two different functions in such cases. See below.


Don't use right recursion unless necessary. The correct grammar is:

property_list: property
             | property_list property

property: PROPERTY_NAME single_property_content

You could call the first production properties if you prefer, but you don't need two different non-terminals. (You might not need single_property_content either, but that depends on the rest of your grammar.)

In Ply, there is no need to align reduction functions with non-terminals. It is quite common to use different functions for the different alternatives whose reduce actions differ, and you can often combine similar productions for different non-terminals. For a simple example:

# This function can be used as the base case of any list non-terminal
# whose semantic value is a list (and whose base case is not empty)
def p_list_base(self, p):
    """property_list: property
       function_list: function
    """
    p[0] = [p[1]]

# This function can be used as the recursion of any list non-terminal
# whose semantic value is a list.
def p_list_recur(self, p):
    """property_list: property_list property
       function_list: function_list function
    """
    p[1].append(p[2])
    p[0] = p[1]
rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks a lot. This works perfectly. However, doc strings are not required. If first applied your solution (with doc strings) and it worked. But i had seen example with regular quotes, so i removed the doc strings and replaced them with regular quotes. It still works. And i prefer it that way, because of the way my editor colors regular strings compared to doc strings. Thanks! – Loïc N. Sep 11 '17 at 22:29
  • @loic: a docstring is simply the first statement of a function if it is a string literal. It doesn't matter what kind of string literal it is or what kind of quotes you use. Raw strings (triple-quoted string literals) are just ordinary strings, too, but they are convenient if you need the string to span more than one line or if it contains many special characters, which is why they are often used for regexes. The point of my answer is that you can only have one docstring; your incorrect function had two. – rici Sep 11 '17 at 23:30
  • Thanks for the clarification about the doc string (i'm new to python and its vocabulary/concepts). When i used my toy example with 2 properties to parse, everything was ok, and i thought that was the end of it, but when i added more properties, the parser generated errors again. I'm looking into it. – Loïc N. Sep 12 '17 at 01:56