The requirements are taken from the DOT language specification, more precisely I'm trying to parse the [ID]
attribute, which can be e.g.,
any double-quoted string ("...") possibly containing escaped quotes (\")1;
The following should be a minimal example.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Void
import Data.Char
import Data.Text hiding ( map
, all
, concat
)
type Parser = Parsec Void Text
escape :: Parser String
escape = do
d <- char '\\'
c <- oneOf ['\\', '\"', '0', 'n', 'r', 'v', 't', 'b', 'f']
return [d, c]
nonEscape :: Parser Char
nonEscape = noneOf ['\\', '\"', '\0', '\n', '\r', '\v', '\t', '\b', '\f']
identPQuoted :: Parser String
identPQuoted =
let inner = fmap return (try nonEscape) <|> escape
in do
char '"'
strings <- many inner
char '"'
return $ concat strings
identP :: Parser Text
identP = identPQuoted >>= return . pack
main = parseTest identP "\"foo \"bar\""
The above code fails on the second with returns "foo "
even though I want foo "bar
I don't understand why. I thought that megaparsec
would repeatedly apply inner
until it parses the final "
. But it only repeatedly applies the nonEscape
parser and the first time that fails, and it uses escape
, it then appears to skip the rest of the inner string and just move on to the final quotes.