6

I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens.

Here are some examples of Parsec's error messages while using Char tokens:

ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf  wrong"
parse error at (line 1, column 7):
unexpected "w"
expecting space or "ok"


ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong"
parse error at (line 1, column 1):
unexpected "w"
expecting "ok" or "nop"

So, string parser shows what string is expected when found an unexpected string, and choice parser shows what are alternatives.

But when I use same combinators with my tokens:

ghci> Parser.parseTest ((tok $ Ide "asdf") >> (tok $ Ide "ok")) "asdf  "
parse error at "test" (line 1, column 1):
unexpected end of input

In this case, it doesn't print what was expected.

ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) "asdf  "
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 1))

And when I use choice, it doesn't print alternatives.

I expect this behavior to be related with combinator functions, and not with tokens, but seems like I'm wrong. How can I fix this?

Here's the full lexer + parser code:

Lexer:

module Lexer
    ( Token(..)
    , TokenPos(..)
    , tokenize
    ) where

import Text.ParserCombinators.Parsec hiding (token, tokens)
import Control.Applicative ((<*), (*>), (<$>), (<*>))

data Token = Ide String
           | Number String
           | Bool String
           | LBrack
           | RBrack
           | LBrace
           | RBrace
           | Keyword String
    deriving (Show, Eq)

type TokenPos = (Token, SourcePos)

ide :: Parser TokenPos
ide = do
    pos <- getPosition
    fc  <- oneOf firstChar
    r   <- optionMaybe (many $ oneOf rest)
    spaces
    return $ flip (,) pos $ case r of
                 Nothing -> Ide [fc]
                 Just s  -> Ide $ [fc] ++ s
  where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++ "_"
        rest      = firstChar ++ ['0'..'9']

parsePos p = (,) <$> p <*> getPosition

lbrack = parsePos $ char '[' >> return LBrack
rbrack = parsePos $ char ']' >> return RBrack
lbrace = parsePos $ char '{' >> return LBrace
rbrace = parsePos $ char '}' >> return RBrace


token = choice
    [ ide
    , lbrack
    , rbrack
    , lbrace
    , rbrace
    ]

tokens = spaces *> many (token <* spaces)

tokenize :: SourceName -> String -> Either ParseError [TokenPos]
tokenize = runParser tokens ()

Parser:

module Parser where

import Text.Parsec as P
import Control.Monad.Identity
import Lexer

parseTest  :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
    case tokenize "test" s of
        Left e -> putStrLn $ show e
        Right ts' -> P.parseTest p ts'

tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = token show snd test
  where test (t', _) = case t == t' of
                           False -> Nothing
                           True  -> Just t

SOLUTION:

Ok, after fp4me's answer and reading Parsec's Char source more carefully, I ended up with this:

{-# LANGUAGE FlexibleContexts #-}
module Parser where

import Text.Parsec as P
import Control.Monad.Identity
import Lexer

parseTest  :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
    case tokenize "test" s of
        Left e    -> putStrLn $ show e
        Right ts' -> P.parseTest p ts'


type Parser a = Parsec [TokenPos] () a

advance :: SourcePos -> t -> [TokenPos] -> SourcePos
advance _ _ ((_, pos) : _) = pos
advance pos _ [] = pos

satisfy :: (TokenPos -> Bool) -> Parser Token
satisfy f = tokenPrim show
                      advance
                      (\c -> if f c then Just (fst c) else Nothing)

tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = (Parser.satisfy $ (== t) . fst) <?> show t

Now I'm getting same error messages:

ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) " asdf"
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 3))
expecting Ide "ok" or Ide "nop"

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
sinan
  • 6,809
  • 6
  • 38
  • 67
  • 1
    Why do you want to seperate lexing from parsing? Surely the main reason for doing this is tradition - it was simpler to write a tricky parser free of the implementation details of the lexer (which was more routine, perhaps just regular expressions), and in imperative languages, it makes thinking easier to seperate the stages. In nice Haskell Parsec land, writing the lexers and the parsers is nice and easy: lex some strings, combine them to parse them - you can almost write the definition of your language in combinators. Also, you're working hard to pass positions through; let Parsec do it. – AndrewC Aug 28 '12 at 23:42
  • @AndrewC, you may be right. I just wanted to see the good and bad parts of separating lexing and parsing stages in parsec. Now after I look my final code, I think I'll go with just parser. (also, once I was using alex+happy to parse an indentation-based grammar and lexing helped me to generate indent+dedent tokens, and let the parser work on simplified grammar. separate lexing stage in parsec could also help in this kind of situations) – sinan Aug 29 '12 at 04:21
  • @AndrewC, also, I really love Parsec and I think being able to work on different kind of streams(other than character streams) can be really helpful and writing a lexer helped me to understand how can I do that. Now I know how can I work on byte strings, for example. – sinan Aug 29 '12 at 04:29

1 Answers1

5

A beginning of solution can be to define your choice function in the Parser, use a specific unexpected function to override unexpected error and finally use the <?> operator to override the expecting message:

mychoice [] = mzero
mychoice (x:[]) = (tok x <|> myUnexpected) <?> show x 
mychoice (x:xs) = ((tok x <|> mychoice xs) <|> myUnexpected)  <?> show (x:xs)

myUnexpected =  do 
             input <- getInput 
             unexpected $ (id $ first input )
           where 
            first [] = "eof"
            first (x:xs) = show $ fst x

and call your parser like that :

ghci> Parser.parseTest (mychoice [Ide "ok", Ide "nop"]) "asdf  "
parse error at (line 1, column 1):
unexpected Ide "asdf"
expecting [Ide "ok",Ide "nop"]
Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
fp4me
  • 463
  • 2
  • 8