2

I'm currently writing my simple programming language parser in Haskell with megaparsec library.

I found this megaparsec tutorial, and I wrote following parser code:

import Data.Void

import Text.Megaparsec
import Text.Megaparsec.Char

import qualified Text.Megaparsec.Char.Lexer as L

type Parser = Parsec Void String

lexeme :: Parser a -> Parser a
lexeme = L.lexeme space

rws :: [String] -- list of reserved words
rws = ["if", "then"]

identifier :: Parser String
identifier = (lexeme . try) (p >>= check)
  where
    p = (:) <$> letterChar <*> many alphaNumChar
    check x =
        if x `elem` rws
            then fail $ "keyword " ++ show x ++ " cannot be an identifier"
            else return x

A simple identifier parser with reserved name error handling. It successfully parses valid identifier such as foo, bar123.

But when an invalid input(a.k.a. reserved name) goes in to the parser, it outputs error:

>> parseTest identifier "if"
1:3:
keyword "if" cannot be an identifier

which, error message is alright, but error location(1:3:) is a bit different from what I expected. I expected error location to be 1:1:.

In the following part of definition of identifier,

identifier = (lexeme . try) (p >>= check)

I expected try would behave like there was no input consumed if (p >>= check) fails and go back to source location 1:1:.

Is my expectation wrong? How can I get this code work as I intended?

suhdonghwi
  • 955
  • 1
  • 7
  • 20
  • I guess that when `fail` is run, the position is `1:3`, since `p` consumed the prefix. Later on `try` rolls it back, but it's too late. – chi Feb 04 '18 at 16:43
  • @chi Thanks. But why is `try` too late to roll it back? Is there a solution for this? – suhdonghwi Feb 04 '18 at 18:00
  • 3
    Don’t confuse the error reporting with the state of the parser. The error occurred at position `1:3` (after the `if` was recognised). The parsing process did backtrack - if you compose `identifier` into a bigger parser using `<|>` you’d observe the other branch being tried at the initial position - but since it didn’t have any more alternatives to try the original error was reported. I’d fix it using a combination of `lookahead` and `notFollowedBy`. – Benjamin Hodgson Feb 04 '18 at 21:04
  • @BenjaminHodgson Thank you very much. I misunderstood the notion of backtracking. However, I fixed my parser with `lookAhead` and `notFollowedBy` as your advice. But this time, error location is `1:1:`, but error message is `unexpected 'i'`. I expected it to be `unexpected 'if'`. How can I achieve this? – suhdonghwi Feb 05 '18 at 01:33
  • Finally. Solved the problem via `observing` function :D – suhdonghwi Feb 05 '18 at 05:38
  • Well, sorry for continuous self-commenting but.. turns out that `obeserving` method also had a problem. It failed to parse if there are trailing characters after reserved name, i.e. `ifa`, `thens`. I ended up with following method: if reserved name is parsed, then set source position to initial position of identifier using `setPosition`. This method fixes all the problems I had, but I'm still looking for a more elegant/haskell-ish solution. – suhdonghwi Feb 05 '18 at 16:51

0 Answers0