I'm trying to write a parser for the Tiger language from modern compiler implementation in ML, and am stuck on one of the recursive types.
I have the following type
data LValue =
Id Atom
| RecordAccess LValue Atom
| ArraySubscript LValue Expression
with the following grammar:
lvalue -> id
-> lvalue.id
-> lvalue[exp]
id -> atom
exp -> (the big, overarching, everything-is-an-expression type)
And I'm trying to parse it with Parsec, but I'm getting stuck in an infinitely recursive loop. Here's my current base parser:
lvalueParser :: Parsec String () LValue
lvalueParser =
try (Id <$> (atomParser <* (notFollowedBy (char '.'))))
<|> try recordAccessParser
where recordAccessParser = (uncurry RecordAccess) <$> do {
record <- lvalueParser;
char '.';
atom <- atomParser;
return (record, atom)
}
(Note: I haven't yet attempted to implement anything to handle the ArrayAccess
portion)
Clearly, the infinite-loopiness happens when the recordAccessParser
calls back to the lvalueParser
.
I can change the recordAccessParser
thusly:
recordAccessParser = (uncurry RecordAccess) <$> do {
record <- atomParser;
char '.';
atom <- atomParser;
return (Id record, atom)
}
and it terminates. However, it will not parse record access more than a single level deep:
Parsec.parse lvalueParser "" "record_1.field_1.field_2"
#=> RecordAccess (Id record_1) (Id field_1)
And I expect
#=> RecordAccess (RecordAccess (Id record_1) (Id field_1)) (Id field_2)
I looked at chainl1
, but the type of the chaining parser is a -> a -> a
, and that doesn't match the type of LValue
that reflects the grammar. I also looked at many
; however I don't have a constant prefix for each term - the left recursion term is what I'm trying to parse into part of the result type.
I imagine I'm missing a specific concept of Parsec/parsing and would love to be pointed in the right direction. There are more types in the language that I'm writing a parser for that will have similar constructions.