1

I am currently writing the parser for a compiler of a toy language using Happy & Alex. Since some form of optional layout is required I have to change Alex's state before matching the block non-terminal. Unfortunately it seems that the lookahead token required by Happy is read before I have the chance to change Alex's state.

Here is a small snippet demonstrating the problem:

funcDef : header localDefs block
                          ^ I have to change alex's state 
                            before the underlying lexer
                            starts reading the block tokens.

Is there a common approach to this problem ?

  • How do you know where a block starts? I presume that `localDefs` is not self-terminating, so there must be some lexical feature that you can use to know where the block starts. Could you possibly elucidate a bit? – rici Mar 25 '17 at 15:08
  • @rici The block is either surrounded by begin/end keywords or otherwise is indentation based. It is basically defined as either begin stmts+ end or stmts autoend. The lexer needs to be notified that a begin is missing in order to produce an autoend when it detecteds an identation change. The whole approach feels very hacky there has to be a better way. – Liarokapis Alexandros Mar 25 '17 at 15:54
  • I just solved this _exact_ problem two weeks ago. Didn't think anyone else would run across it. – Alec Mar 25 '17 at 16:01
  • @Alec eagerly waiting for details! – Liarokapis Alexandros Mar 25 '17 at 16:11
  • I'll wait to see what @alec writes, but my approach would be either to make the newline visible to the parser, in which case the lexer change can be done *before* the trigger token, or to handle it all in the lexer. Both are a bit hacky but so is the syntax :) Contrast python's use of colon. – rici Mar 25 '17 at 16:46
  • @rici Please do post your solution - I would be very interested to see what else is possible. In my case, I was stuck with a lexer that I couldn't really change (I wanted it to behave like the reference implementation), so I instead ended up doing backflips in the parser. – Alec Mar 25 '17 at 16:49

1 Answers1

1

I am assuming you are using a threaded lexer (so Happy and Alex are running in the same monad). The trick I used when faced with a similar problem is to make an empty production rule that you slip into your rule.

changeAlexState :: { () }
  : {- empty -} {%% \tok -> changeAlexState *> pushTok tok }

funcDef : header localDefs changeAlexState block

Then, you need to add to your monad some state to support pushTok :: Token -> P () (where P is your lexing/parsing monad) and make sure you always pop that token when when you are lexing. What %% does is documented here.

n : t_1 ... t_n {%% <expr> }

... The type of <expr> is the same [still Token -> P a], but in this case the lookahead token is actually discarded and a new token is read from the input. This can be useful when you want to change the next token and continue parsing.

I mentioned I did something similar not long ago. Here is my "empty" rule, here is an example use of it, here is where my pushing function is defined and here is where I "pop" tokens. Let me know how it goes!

Alec
  • 31,829
  • 7
  • 67
  • 114