Translate grammar production to Parsec

Question

I'm trying to convert the following grammar production

callExpr:
    primaryExpr
  | callExpr primaryExpr

to a Parsec expression in Haskell.

Obviously the problem is that it's left-recursive, so I'm trying to parse it recursive-ascent style. The pseudocode I'm trying to implement is:

e = primaryExp
while(true) {
    e2 = primaryExp
    if(e2 failed) break;
    e = CallExpr(e, e2)
}

and my attempt at translating this into Haskell is:

callExpr :: IParser Expr
callExpr = do
    e <- primaryExpr
    return $ callExpr' e
  where
    callExpr' e = do
        e2m <- optionMaybe primaryExpr
        e' <- maybe e (\e2 -> callExpr' (CallExpr e e2)) e2m
        return e'

where primaryExpr has type IParser Expr and IParser is defined as

type IParser a = ParsecT String () (State SourcePos) a

This however gives me the following type error:

Couldn't match type `ParsecT String () (State SourcePos) t0'
              with `Expr'
Expected type: ParsecT String () (State SourcePos) Expr
  Actual type: ParsecT
                 String
                 ()
                 (State SourcePos)
                 (ParsecT String () (State SourcePos) t0)
In a stmt of a 'do' block: return $ callExpr' e
In the expression:
  do { e <- primaryExpr;
       return $ callExpr' e }
In an equation for `callExpr':
    callExpr
      = do { e <- primaryExpr;
             return $ callExpr' e }
      where
          callExpr' e
            = do { e2m <- optionMaybe primaryExpr;
                   .... }

How do I fix this type error?

score 5 · Accepted Answer · edited Mar 25 '15 at 16:18

5

Use chainl1. chainl1 p op parses one or more p-s separated by op-s in a left-associative way. op returns a binary function which is used to combine the results of the p-s on both sides into a single result.

Since your grammar doesn't seem to have a separator, you can use chainl1 with an op that just returns the combining function:

callExpr :: IParser Expr
callExpr = chainl1 primaryExpr (return CallExpr)

As to your callExpr implementation, I can spot two errors.

First, you use return $ callExpr' e, but callExpr' e is already a monadic value, so just callExpr' e would be correct.

Second, in maybe e (\e2 -> callExpr' (CallExpr e e2)) e2m, the default e should be monadic (or else how could we bind it to e'?), so it should be return e.

edited Mar 25 '15 at 16:18

Mathias Vorreiter Pedersen

780
8
22

answered Mar 25 '15 at 14:36

András Kovács

29,931
3
53
99

Could you explain why this combinator is needed? What makes it better than using `many` and then folding over the result? I'm still trying to learn the basics of parsing too. – dfeuer Mar 25 '15 at 14:44
3

It's probably somewhat faster, but I think it's mostly convention and communication of intent. I think parser combinators should generally resemble "pure" CFG notation as much as possible, and should minimize the amount of post-facto AST manipulation. Readers of our code should be able to see the grammar clearly and be not overly burdened by operational details. Left recursion is one point where we must deviate from abstract notation, so we try to encapsulate it in `chainl`. – András Kovács Mar 25 '15 at 15:37

Translate grammar production to Parsec

1 Answers1