So I'm trying to do the standard "write yourself a parser for a scheme-like language" exercise to figure out MegaParsec and monad transformers. Following the suggestions of many tutorials and blog posts, I'm using ReaderT
and local
to implement lexical scope.
I run into trouble trying to implement let*
. Both let
and let*
share the same syntax, binding variables for use in a subsequent expression. The difference between the two is that let*
lets you use a binding in subsequent ones, whereas let
doesn't:
(let ((x 1) (y 2)) (+ x y)) ; 3
(let* ((x 1) (y (+ x x)) (+ x y)) ; 3
(let ((x 1) (y (+ x x)) (+ x y)) ; Error unbound symbol "x"
My problem is that when parsing a let*
expression, I need to add the bindings to the current scope one-by-one so that each binding is available for use in the subsequent ones. This seems like a good use case for StateT
; allowing me to build up the local scope one binding at a time.
Then, having parsed all the new bindings, I can pass these, together with those inherited from the parent scope, to the third argument of the let*
expression, via local
.
I build my monad transformer stack as follows:
type Parser = Parsec Void String
type Env = Map.Map String Float
type RSParser = ReaderT Env (StateT Env Parser)
And here's the parser, simplified as much as I could while still making my point. In particular, Float
is the only data type and +
, *
, and let*
are the only commands.
data Op = Plus | Times
spaceConsumer :: Parser ()
spaceConsumer = Lexer.space space1
(Lexer.skipLineComment ";")
(Lexer.skipBlockComment "#|" "|#")
lexeme :: Parser a -> RSParser a
lexeme = lift . lift . Lexer.lexeme spaceConsumer
lParen, rParen :: RSParser Char
lParen = lexeme $ char '('
rParen = lexeme $ char ')'
plus, times :: RSParser Op
plus = lexeme $ char '+' $> Plus
times = lexeme $ char '*' $> Times
keyValuePair :: RSParser ()
keyValuePair = between lParen rParen $ do
state <- get
name <- lift . lift $ Lexer.lexeme spaceConsumer (some letterChar)
x <- num
modify (union (fromList [(name, x)]))
keyValuePairs :: RSParser ()
keyValuePairs = between lParen rParen (many keyValuePair) $> ()
num :: RSParser Float
num = lexeme $ Lexer.signed (return ()) Lexer.float
expr, var :: RSParser Float
expr = num <|> var <|> between lParen rParen (arithExpr <|> letStarExpr)
var = do
env <- ask
lift . lift $ do
name <- Lexer.lexeme spaceConsumer (some letterChar)
case Map.lookup name env of
Nothing -> mzero
Just x -> return x
arithExpr = do
op <- (plus <|> times) <?> "operation"
args <- many (expr <?> "argument")
return $ case op of
Plus -> sum args
Times -> product args
letStarExpr = lexeme (string "let*") *> do
keyValuePairs
bindings <- get
local (Map.union bindings) expr
main :: IO ()
main = do
parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
"(+ (let* ((x 666.0)) x) x)"
-- (667.0,fromList [("x",666.0)]) Ok
parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
"(+ (let* ((x 666.0)) x) (let* ((w 0.0)) x))"
-- (1332.0,fromList [("x",666.0)]) Wrong
The first test above succeeds, but the second fails. It fails because the mutable state holding x
's binding in the first let*
expression is carried over to the second let*
expression. I need a way to make the this mutable state local to the computation in question and this is what I can't figure out how to do. Is there an analogue of the local
command from Reader
for State
? Am I using the wrong monad transformer stack? Is my approach fundamentally flawed?
The naive (in retrospect) solution that I tried is resetting the mutable state at each let*
expression by adding a put Map.empty
statement to letStarExpr
:
letStarExpr = lexeme (string "let*") *> do
keyValuePairs
bindings <- get
put Map.empty
local (Map.union bindings) expr
But this is incompatible with nested let*
expressions:
parseTest (runStateT (runReaderT expr (fromList [("x", 1)])) Map.empty)
(let* ( (x 666.0) (y (let* ((z 3.0)) z)) ) x)
gives 1.0 instead of 666.0.
Any ideas?