0

I have actually asked this question before (here) but it turns out that the solution provided did not handle all test cases. Also, I need 'Text' parser rather than 'String', so I need parsec3.

Ok, the parser should allow for EVERY type of char inbetween quotes, even quotes. The end of the quoted text is marked by a ' character, followed by |, a space or end of input.

So,

'aa''''|

should return a string

aa'''

This is what I have:

import Text.Parsec
import Text.Parsec.Text


quotedLabel :: Parser Text
quotedLabel = do -- reads the first quote.
    spaces
    string "'"
    lab <-  liftM pack $ endBy1 anyChar endOfQuote
    return  lab

endOfQuote = do
    string "'"
    try(eof) <|> try( oneOf "| ")

Now, the problem here is of course that eof has a different type than oneOf "| ", so compilation falls.

How do I fix this? Is there a better way to achieve what I am trying to do?

Community
  • 1
  • 1
Fredrik Karlsson
  • 485
  • 8
  • 21
  • Stylistically I would use `fmap` instead of `liftM`. Also I'm pretty sure you don't need `try` around either `eof` or `oneOf`, My understanding is that you only need `try` if the parser can consume input without succeeding. – ErikR Jul 15 '16 at 20:11
  • Your `try` is probably going to have to be higher -- e.g. `endOfQuote = try $ do { ... }` or even better `quotedLabel = do { ... endBy1 anyChar (try endOfQuote) ... }`. Are you sure you want `endBy1`? The documentation for it doesn't seem to match your description of what you want. – Daniel Wagner Jul 15 '16 at 20:55

2 Answers2

2

To change the result of any functor computation you can just use:

fmap (const x) functor_comp

e.g.:

getLine :: IO String
fmap (const ()) getLine :: IO ()

eof :: Parser ()
oneOf "| "  :: Parser Char

fmap (const ()) (oneOf "| ") :: Parser ()

Another option is to use operators from Control.Applicative:

getLine *> return 3  :: IO Integer

This performs getLine, discards the result and returns 3.

In your case, you might use:

try(eof) <|> try( oneOf "| " *> return ())
ErikR
  • 51,541
  • 9
  • 73
  • 124
2

Whitespace

First a comment on handling white space...

Generally the practice is to write your parsers so that they consume the whitespace following a token or syntactic unit. It's common to define combinator like:

lexeme p = p <* spaces

to easily convert a parser p to one that discards the whitespace following whatever p parses. E.g., if you have

number = many1 digit

simply use lexeme number whenever you want to eat up the whitespace following the number.

For more on this approach to handling whitespace and other advice on parsing languages, see this Megaparsec tutorial.

Label expressions

Based on your previous SO question it appears you want to parse expressions of the form:

label1 | label2 | ... | labeln

where each label may be a simple label or a quoted label.

The idiomatic way to parse this pattern is to use sepBy like this:

labels :: Parser String
labels = sepBy1 (try quotedLabel <|> simpleLabel) (char '|')

We define both simpleLabel and quotedLabel in terms of what characters may occur in them. For simpleLabel a valid character is a non-| and non-space:

simpleLabel :: Parser String
simpleLabel = many (noneOf "| ")

A quotedLabel is a single quote followed by a run of valid quotedLabel-characters followed by an ending single quote:

sq = char '\''

quotedLabel :: Parser String
quotedLabel = do
  char sq
  chs <- many validChar
  char sq
  return chs

A validChar is either a non-single quote or a single quote not followed by eof or a vertical bar:

validChar = noneOf [sq] <|> try validQuote

validQuote = do
  char sq
  notFollowedBy eof
  notFollowedBy (char '|')
  return sq

The first notFollowedBy will fail if the single quote appears just before the end of input. The second notFollowedBy will fail if next character is a vertical bar. Therefore the sequence of the two will succeed only if there is a non-vertical bar character following the single quote. In this case the single quote should be interpreted as part of the string and not the terminating single quote.

Unfortunately this doesn't quite work because the current implementation of notFollowedBy will always succeed with a parser which does not consume any input -- i.e. like eof. (See this issue for more details.)

To work around this problem we can use this alternate implementation:

notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
      do {a <- try p; return (unexpected (show a));}
  <|> return (return ())

Here is the complete solution with some tests. By adding a few lexeme calls you can make this parser eat up any white space where you decide it is not significant.

import Text.Parsec hiding (labels)
import Text.Parsec.String
import Control.Monad

notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
      do {a <- try p; return (unexpected (show a));}
  <|> return (return ())

sq = '\''

validChar = do
  noneOf "'" <|> try validQuote

validQuote = do
  char sq
  notFollowedBy' eof
  notFollowedBy (char '|')
  return sq

quotedLabel :: Parser String
quotedLabel = do
  char sq
  str <- many validChar
  char sq
  return str

plainLabel :: Parser String
plainLabel = many (noneOf "| ")

labels :: Parser [String]
labels = sepBy1 (try quotedLabel <|> try plainLabel) (char '|')

test input expected = do
  case parse (labels <* eof) "" input of
    Left e -> putStrLn $ "error: " ++ show e
    Right v -> if v == expected
                 then putStrLn $ "OK - got: " ++ show v
                 else putStrLn $ "NOT OK - got: " ++ show v ++ "  expected: " ++ show expected

test1 = test "a|b|c"      ["a","b","c"]
test2 = test "a|'b b'|c"  ["a", "b b", "c"]
test3 = test "'abc''|def" ["abc'", "def" ]
test4 = test "'abc'"      ["abc"]
test5 = test "x|'abc'"    ["x","abc"]
Community
  • 1
  • 1
ErikR
  • 51,541
  • 9
  • 73
  • 124
  • Ok, this works. A bit unsatisfactory that I need to use `Char` and then pack concatenated Strings to Text, but at least it seems to work for all exemple I can think of right now. – Fredrik Karlsson Jul 16 '16 at 17:23
  • You don't have to perform the conversion of String -> Text for every sub-parser. For instance, you can convert `labels` to a `Parser [Text]` with: `fmap (map T.pack) labels`. – ErikR Jul 16 '16 at 17:38
  • Yes, that is true. Thanks. – Fredrik Karlsson Jul 21 '16 at 19:45