1

I am writing a lexer for Haskell using JavaScript and Parsing Expression Grammar, the implementation I use being PEG.js.
I have a problem with making it work for reserved words, as demonstrated in a simplified form here:

program = ( word / " " )+  
word = ( reserved / id )  
id = ( "a" / "b" )+  
reserved = ( "aa" )

The point here is to get a series of tokens that are either arbitrary sequences of a:s and/or b:s or the sequence "aa", and they are separated by spaces.
What I really get is either that every token that is not a space is recognized as id or that a token that should be recognised as id has all initial pairs of a:s eaten up as reserved, e.g.
"aab" gets recognized as reserved "aa" followed by id "b".

The way the Haskell lexical specification solves this ambiguity is to specify id like this:

id = ( "a" / "b" )+[BUT NOT reserved]

I have tried replicating this using various combinations of the PEG ! and & -operators to acheive the same effect, but have not found a way to get this to work properly.
The solution:

id = !reserved ( "a" / "b" )+

that I've seen suggested in several places does not work.
Is this a limitation in the particular PEG-implementation, PEG in itself or (hopefully) my methods?

Thanks in advance!

Marcel Korpel
  • 21,536
  • 6
  • 60
  • 80
evilcandybag
  • 1,942
  • 17
  • 17

2 Answers2

1

!reserved ident is a perfectly acceptable technique in any PEG implementation, and PEG.js seems to support it as well. Btw, you should add !id after the definition of reserved.

SK-logic
  • 9,605
  • 1
  • 23
  • 35
  • With the !id added to the end of reserved, I get a partially successful result. If I have an uneven sequence of "a":s it parses correctly, but even sequences still give a bad result. – evilcandybag Feb 08 '11 at 15:40
  • I meant something in line with `reservedBase = "aa"/"bb" reserved = reservedBase !idBase idBase = ("a"/"b")+ id = !reserved idBase`. Take a look at the `javascript.jspeg` example provided. – SK-logic Feb 08 '11 at 15:44
  • While I still have the same problem specified in the first comment, I can't seem to find a better solution, so I'll consider this answered for now. I'll update if I find something better. Thanks! – evilcandybag Feb 08 '11 at 22:15
0

As far as I know, PEG rules are positional. That basically means that rules are tried deterministically from the first to the last one. That said, you just need to put the "reserved" rule before declaring the "identifier" one.

cheng81
  • 2,434
  • 2
  • 21
  • 18