peg/leg - some curious behavior of PEG specification for expression parsing

Question

Recently I was learning about Parsing Expression Grammars. After reading some materials on it by searching the web I finally sat down to make my hands dirty. I thought of implementing the hoc program from The Unix Programming Environment by Kernighan and Pike, with PEG. For that matter I chose PEG parser generator peg/leg as it has good documentation and an extensive collection of examples. I took dc.peg from the examples as my starting point and just implemented support for unary minus in phoc version 1.

Now when I write the following productions for unary minus the generated parser is not recognizing ((-9)), ((-9)*3)/4, (8-(-2)*5)/2-(2+2), etc.

Operand     <-    SGNValue
                  / Value
SGNValue    <-    OPEN? '-' Value CLOSE?
Value       <-    NUMBER                                     
                  / OPEN Sum CLOSE

But if I change SGNValue to the following those expressions are perfectly matched.

SGNValue    <-    '-' NUMBER                                        
                  / '-' OPEN Sum CLOSE

Could anyone here please explain to me what was wrong with the above production rule?

I am also attaching phoc1.peg file with this post.

Thanks and regards Santanu

# Grammar

Expr        <-    SPACE Sum EOL                                      { printf("%f\n", pop()); }
                  / (!(EOL / 'q') .)* EOL                            { printf("error: quiting hoc1\n"); \
                                                                   exit(1); }
                  / 'q' EOL                                          { printf("bye\n"); exit(0);}

Sum         <-    Product ( PLUS  Product                            { double r = pop(), l = pop(); \
                                                                   push(l + r); }
                  / MINUS Product                                    { double r = pop(), l = pop(); \
                                                                   push(l - r); }
                  )*

Product     <-    Operand ( TIMES  Operand                           { double r = pop(), l = pop(); \
                                                                   push(l * r); }
                  / DIVIDE Operand                                   { double r = pop(), l = pop(); \
                                                                   push(l / r); }
                  )*

Operand     <-    SGNValue
                  / Value

## SGNValue    <-    OPEN? '-' Value CLOSE?
## This production rule was throwing error
## for expressions like ((-9)), ((-9)*3)/4,
## (8-(-2)*5)/2-(2+2), etc. probably due to
## left recurssion. But it was behaving as expected
## for (((9))), ((9)*2), etc.

SGNValue    <-    '-' NUMBER                                         { push(-1*atof(yytext)); }
                  / '-' OPEN Sum CLOSE                               { double d = pop(); push(-1*d); }

Value       <-    NUMBER                                             { push(atof(yytext)); }
                  / OPEN Sum CLOSE

# Lexemes

NUMBER  <-  < [0-9]+ ('.' [0-9]+)? >  SPACE
PLUS    <-  '+'                       SPACE
MINUS   <-  '-'                       SPACE
TIMES   <-  '*'                       SPACE
DIVIDE  <-  '/'                       SPACE
OPEN    <-  '('                       SPACE
CLOSE   <-  ')'                       SPACE
SPACE   <-  [ \t]*
EOL     <-  '\n' / '\r\n' / '\r'

score 3 · Accepted Answer · answered May 31 '14 at 04:40

When you write:

SGNValue    <-    OPEN? '-' Value CLOSE?

you're saying that both the OPEN and the CLOSE are optional, which means that either or both may be present. It will happily match, for example, -9), where only the OPEN is missing.

So the prefix ((-9 matches VALUE <- OPEN Sum CLOSE leaving (-9 to match SUM, which recurses to OPERAND which matches VALUE / SGNValue; remember that PEG's / operator is order-aware, so it first tries VALUE which again matches the ( and recurses. Then we have -9 which can't match VALUE but it can certainly match SGNValue. Unfortunately, -9) matches SGNValue, as noted above, after which there are not enough close parentheses to match the pending open parentheses from the two VALUEs.

I'm not a big fan of PEG, and order-aware alternation is one of the reasons. But that's a very personal opinion; if they work for you, more power to you. OPEN? Sum CLOSE? would be incorrect in any BNF-style grammar formalism, but normally it would lead to accepting erroneous sentences rather than rejecting correct ones.

Thanks for explaining. I missed the point with "?" operator. I was recently doing some search on parser generator tools and come to know about some interesting alternatives to LEX/YACC. PEG attracted my attention while doing this exercise. I have found on net that many people have spoke about PEG's problems. But the interesting part of PEG is it does not require to code a separate lexer. So probably for a quick development of a DSL to integrate with my C application PEG can be a good choice. I am just experimenting with PEG to get a grip of it. — Santanu, Jun 01 '14 at 06:01
@Santanu: I don't find coding lexers that painful and a good Makefile will go a long way towards dealing with the multiple tool issue. Still, tastes vary; as I said, if packrat parsing works for you, that's cool. If this answer solves your issue (or, for that matter, answers to your other questions), you should "accept" it by clicking on the outline of the checkmark; that will help both people volunteering to provide answers, and people with similar queries. — rici, Jun 01 '14 at 06:27

peg/leg - some curious behavior of PEG specification for expression parsing

1 Answers1