Parsing with Haskell/Megaparsec: StateT for building up local, lexical scope?

Question

So I'm trying to do the standard "write yourself a parser for a scheme-like language" exercise to figure out MegaParsec and monad transformers. Following the suggestions of many tutorials and blog posts, I'm using ReaderT and local to implement lexical scope.

I run into trouble trying to implement let*. Both let and let* share the same syntax, binding variables for use in a subsequent expression. The difference between the two is that let* lets you use a binding in subsequent ones, whereas let doesn't:

(let ((x 1) (y 2)) (+ x y))       ; 3
(let* ((x 1) (y (+ x x)) (+ x y)) ; 3
(let ((x 1) (y (+ x x)) (+ x y))  ; Error unbound symbol "x"

My problem is that when parsing a let* expression, I need to add the bindings to the current scope one-by-one so that each binding is available for use in the subsequent ones. This seems like a good use case for StateT; allowing me to build up the local scope one binding at a time. Then, having parsed all the new bindings, I can pass these, together with those inherited from the parent scope, to the third argument of the let* expression, via local.

I build my monad transformer stack as follows:

type Parser = Parsec Void String
type Env = Map.Map String Float
type RSParser = ReaderT Env (StateT Env Parser)

And here's the parser, simplified as much as I could while still making my point. In particular, Float is the only data type and +, *, and let* are the only commands.

data Op = Plus | Times

spaceConsumer :: Parser ()
spaceConsumer = Lexer.space space1
                            (Lexer.skipLineComment ";")
                            (Lexer.skipBlockComment "#|" "|#")
lexeme :: Parser a -> RSParser a
lexeme = lift . lift . Lexer.lexeme spaceConsumer

lParen, rParen :: RSParser Char
lParen = lexeme $ char '('
rParen = lexeme $ char ')'

plus, times :: RSParser Op
plus = lexeme $ char '+' $> Plus
times = lexeme $ char '*' $> Times

keyValuePair :: RSParser ()
keyValuePair = between lParen rParen $ do
    state <- get
    name  <- lift . lift $ Lexer.lexeme spaceConsumer (some letterChar)
    x     <- num
    modify (union (fromList [(name, x)]))

keyValuePairs :: RSParser ()
keyValuePairs = between lParen rParen (many keyValuePair) $> ()

num :: RSParser Float
num = lexeme $ Lexer.signed (return ()) Lexer.float

expr, var :: RSParser Float
expr = num <|> var <|> between lParen rParen (arithExpr <|> letStarExpr)
var = do
    env <- ask
    lift . lift $ do
        name <- Lexer.lexeme spaceConsumer (some letterChar)
        case Map.lookup name env of
            Nothing -> mzero
            Just x  -> return x
arithExpr = do
    op   <- (plus <|> times) <?> "operation"
    args <- many (expr <?> "argument")
    return $ case op of
        Plus  -> sum args
        Times -> product args
letStarExpr = lexeme (string "let*") *> do
    keyValuePairs
    bindings <- get
    local (Map.union bindings) expr

main :: IO ()
main = do
    parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
              "(+ (let* ((x 666.0)) x) x)"
        -- (667.0,fromList [("x",666.0)]) Ok
    parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
              "(+ (let* ((x 666.0)) x) (let* ((w 0.0)) x))"
        -- (1332.0,fromList [("x",666.0)]) Wrong

The first test above succeeds, but the second fails. It fails because the mutable state holding x's binding in the first let* expression is carried over to the second let* expression. I need a way to make the this mutable state local to the computation in question and this is what I can't figure out how to do. Is there an analogue of the local command from Reader for State? Am I using the wrong monad transformer stack? Is my approach fundamentally flawed?

The naive (in retrospect) solution that I tried is resetting the mutable state at each let* expression by adding a put Map.empty statement to letStarExpr:

letStarExpr = lexeme (string "let*") *> do
    keyValuePairs
    bindings <- get
    put Map.empty
    local (Map.union bindings) expr

But this is incompatible with nested let* expressions:

parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
    (let* ( (x 666.0) (y (let* ((z 3.0)) z)) ) x)

gives 1.0 instead of 666.0.

Any ideas?

In `letStarExpr`, you may need to create a new state instead of resetting the old state. The new state is a copy of the state at entrance to the "let*" — Bob Dalgleish, Dec 20 '18 at 21:34
A checkable code example and specifying the megaparsec version would help. — András Kovács, Dec 20 '18 at 21:35
Consider parsing and evaluating in separate steps rather than trying to do evaluation at the same time you’re doing parsing. — Alexis King, Dec 20 '18 at 21:51
@AndrásKovács: Here is a gist with a runnable code sample, together with `package.yaml` and `stack.yaml`, for package versions, etc. In particular, I'm using `megaparsec-7.0.4`. https://gist.github.com/mgreenbe/61d45a089ba3df0d02b2d7c05e738637 — fmg, Dec 20 '18 at 22:33
@AlexisKing: Splitting it up is definitely a good option. Still curious if it is idiomatically possible in one pass. — fmg, Dec 20 '18 at 22:35
@BobDalgleish: Could you, perhaps, say a few words about how to do that? — fmg, Dec 20 '18 at 22:36

András Kovács · Accepted Answer · 2018-12-21T10:50:13.830

As Alexis King pointed out in comments, it is standard practice to separate parsing from evaluation.

However, to address the current question, it is possible here to evaluate while parsing in an idiomatic way. The key point is the following: lexical scoping without any context-sensitive rules only ever requires a Reader monad, for scope/type checking and evaluation as well. The reason is in the "lexical" property: purely nested scopes have no side effects on other branches of scope structure, hence there should be nothing to be carried around in a state. So it's best to just get rid of the State.

The interesting part is letStarExpr. There, we cannot use many anymore, because it doesn't allow us to handle the newly bound names on each key-value pair. Instead, we can write a custom version of many which uses local to bind a new name on each recursive step. In the code example I just inline this function using fix.

Another note: lift should not be commonly used with mtl; the point of mtl is to eliminate most lifts. The megaparsec exports are already generalized over MonadParsec. Below is a code example with megaparsec 7.0.4, I did the mentioned changes and a few further stylistic ones.

import Control.Monad.Reader
import Data.Map as Map
import Data.Void

import Text.Megaparsec
import qualified Text.Megaparsec.Char as Char
import qualified Text.Megaparsec.Char.Lexer as Lexer

type Env    = Map String Double
type Parser = ReaderT Env (Parsec Void String)

spaceConsumer :: Parser ()
spaceConsumer = Lexer.space Char.space1
                            (Lexer.skipLineComment ";")
                            (Lexer.skipBlockComment "#|" "|#")

lexeme = Lexer.lexeme spaceConsumer
symbol = Lexer.symbol spaceConsumer
char   = lexeme . Char.char

parens :: Parser a -> Parser a
parens = between (char '(') (char ')')

num :: Parser Double
num = lexeme $ Lexer.signed (pure ()) Lexer.float

identifier :: Parser String
identifier = try $ lexeme (some Char.letterChar)

keyValuePair :: Parser (String, Double)
keyValuePair = parens ((,) <$> identifier <*> num)

expr :: Parser Double
expr = num <|> var <|> parens (arithExpr <|> letStarExpr)

var :: Parser Double
var = do
  env  <- ask
  name <- identifier
  maybe mzero pure (Map.lookup name env)

arithExpr :: Parser Double
arithExpr =
      (((sum <$ char '+') <|> (product <$ char '*')) <?> "operation")
  <*> many (expr <?> "argument")

letStarExpr :: Parser Double
letStarExpr = do
  symbol "let*"
  char '('
  fix $ \go ->
        (char ')' *> expr)
    <|> do {(x, n) <- keyValuePair; local (insert x n) go}

main :: IO ()
main = do
    parseTest (runReaderT expr (fromList [("x", 1)]))
              "(+ (let* ((x 666.0)) x) x)"
    parseTest (runReaderT expr (fromList [("x", 1)]))
              "(+ (let* ((x 666.0)) x) (let* ((w 0.0)) x))"

Awesome. Thanks for shaving a monad transformer off of my stack! — fmg, Dec 21 '18 at 17:36

Parsing with Haskell/Megaparsec: StateT for building up local, lexical scope?

1 Answers1