6

I have been working on a pet language, which has Haskell-like syntax. One of the neat things which Haskell does, which I have been trying to replicate, is its insertion of {, } and ; tokens based on code layout, before the parsing step.

I found http://www.haskell.org/onlinereport/syntax-iso.html, which includes a specification of how to implement the layout program, and have made a version of it (modified, of course, for my (much simpler) language).

Unfortunately, I am getting an incorrect output for the following:

f (do x y z) a b

It should be producing the token stream ID ( DO { ID ID ID } ) ID ID, but it is instead producing the token stream ID ( DO { ID ID ID ) ID ID }.

I imagine that this is due to my unsatisfactory implementation of parse-error(t) (parse-error(t) = false), but I have no idea how I would go about implementing parse-error(t) efficiently.

How do Haskell compilers like GHC etc. handle this case? Is there an easy way to implement parse-error(t) such that it handles this case (and hopefully others that I haven't noticed yet)?

Mystor
  • 365
  • 3
  • 9
  • I don’t know how e.g. GHC implements it, but the way I’d implement it is to have the parser, after it consumes a `DO`, see if there’s a `{`, and if there is, expect it to end with a `}`; otherwise, stop parsing the contents of the block when the start column of a token that would potentially start a new item in the block is to the left of the start-column of the first one. – icktoofay Aug 18 '14 at 00:57
  • 2
    It's tricky to implement, and almost impossible to implement correctly for Haskell. What are you using to implement your parser? – augustss Aug 18 '14 at 01:21
  • @augustss My parser is being implemented in Javascript. I don't have a full parser implemented yet, just a pencil/paper BNF grammar, and the lexer/layout code. I was planning on using Jison (http://jison.org) to do the parsing. My language is much more simple than Haskell, so I might not need a fully correct implementation of `parse-error(t)`, it just needs to handle some basic cases. @icktoofay I am currently doing the parsing in 3 steps, lexing, layout, and then parsing. The layout algorithm is doing basically that, but it doesn't have enough context to handle do blocks in brackets. – Mystor Aug 18 '14 at 01:41
  • If Jison is like YACC/Bison then you can use an error production to insert the right curly. – augustss Aug 18 '14 at 09:25
  • @augustss I took a look at some of the code [generated](http://zaach.github.io/jison/try/) by jison, and it definitely has an error production, but I don't see the any code which will ever use it. Errors seem to be fed into an error reporting function, and there doesn't seem to be a way for the parser to recover once an error has occurred. I'll look into it more when I get home. – Mystor Aug 18 '14 at 20:22
  • @Mystor In YACC/Bison you have to put 'error' in the grammar to get a production to match an error. – augustss Aug 19 '14 at 09:02
  • GHC implements the offside rule in the lexer https://github.com/ghc/ghc/blob/master/compiler/parser/Lexer.x – Steven Shaw Jun 18 '16 at 08:15

1 Answers1

0

I ended up implementing a custom version of the parsing algorithm used by JISON's compiled parsers, which takes an immutable state object, and a token, and performs as much work as possible with the token before returning. I am then able to use this implementation to check if a token will produce a parse error, and easily roll back to previous states of the parser.

It works fairly well, although it is a bit of a hack right now. You can find the code here: https://github.com/mystor/myst/blob/cb9b7b7d83e5d00f45bef0f994e3a4ce71c11bee/compiler/layout.js

I tried doing what @augustss suggested, using the error production to fake the token's insertion, but it appears as though JISON doesn't have all of the tools which I need in order to get a reliable implementation, and re-implementing a stripped-down version of the parsing algorithm turned out to be easier, and lined up better with the original document.

Mystor
  • 365
  • 3
  • 9