I am writing a lexer for Haskell using JavaScript and Parsing Expression Grammar, the implementation I use being PEG.js.
I have a problem with making it work for reserved words, as demonstrated in a simplified form here:
program = ( word / " " )+
word = ( reserved / id )
id = ( "a" / "b" )+
reserved = ( "aa" )
The point here is to get a series of tokens that are either arbitrary sequences of a:s and/or b:s or the sequence "aa", and they are separated by spaces.
What I really get is either that every token that is not a space is recognized as id
or that a token that should be recognised as id
has all initial pairs of a:s eaten up as reserved
, e.g.
"aab" gets recognized as reserved "aa"
followed by id "b"
.
The way the Haskell lexical specification solves this ambiguity is to specify id like this:
id = ( "a" / "b" )+[BUT NOT reserved]
I have tried replicating this using various combinations of the PEG ! and & -operators to acheive the same effect, but have not found a way to get this to work properly.
The solution:
id = !reserved ( "a" / "b" )+
that I've seen suggested in several places does not work.
Is this a limitation in the particular PEG-implementation, PEG in itself or (hopefully) my methods?
Thanks in advance!