0

I've got a silly situation in my parsec parsers that I would like your help on.

I need to parse a sequence of strongs / chars that are separated by | characters. So, we could have a|b|'c'|'abcd'

which should be turned into

[a,b,c,abcd]

Space is not allowed, unless inside of a ' ' string. Now, in my naïve attempt, I got the situation now where I can parse strings like a'a|'bb' to [a'a,bb] but not aa|'b'b' to [aa,b'b].

singleQuotedChar :: Parser Char
singleQuotedChar = noneOf "'" <|> try (string "''" >> return '\'')

simpleLabel = do
    whiteSpace haskelldef
    lab <- many1 (noneOf "|")
    return $ lab

quotedLabel = do
    whiteSpace haskelldef
    char '\''
    lab <- many singleQuotedChar
    char '\''
    return $ lab

Now, how do I tell the parser to consider ' a stoping ' iff it is followed by a | or white space? (Or, get some ' char counting into this). The input is user generated, so I cannot rely on them \'-ing chars.

Fredrik Karlsson
  • 485
  • 8
  • 21
  • 2
    You're trying to parse `'b'b'`, but in `singleQuotedChar` you require that single quotes only appear in pairs. Did you mean to try to parse `'b''b'` as `b'b`? If you want `'b'b'` to be parsable you need to change the definition of singleQuotedChar. – rampion Jul 24 '14 at 18:15
  • How is the parser supposed to know it should keep the middle ' in aa¦'b'b' ? I'm struggling to understand what you want ' to do. Is it like a bracketing chracter, but is only valid as such next to a ¦? Why have it at all if so? (Possible answer: this is part of a larger input where spaces mean something else.) – AndrewC Jul 24 '14 at 22:09
  • Here's what I think you mean: A string is a sequence of letters (or numbers?), apostrophes and spaces bracketed by apostrophes, or a sequence of letters (numbers) and apostrophes not bracketed by apostrophes. Strings are interpolated with vertical bars. – AndrewC Jul 24 '14 at 22:16
  • Is 'ab¦b'b¦c'c' valid input, and why? By the way, using noneOf is often a mistake. You should define what _is_ allowed, not what isn't allowed, otherwise you tend to eat separators you shouldn't. – AndrewC Jul 24 '14 at 22:20

1 Answers1

1

Note that allowing a quote in the middle of a string delimited by quotes is very confusing to read, but I believe this should allow you to parse it.

quotedLabel = do -- reads the first quote.
    whiteSpace
    char '\''
    quotedLabel2

quotedLabel2 = do -- reads the string and the finishing quote.
    lab <- many singleQuotedChar
    try  (do more <- quotedLabel3
             return $ lttrace "quotedLabel2" (lab ++ more))
     <|> (do char '\''
             return $ lttrace "quotedLabel2" lab)


quotedLabel3 = do -- handle middle quotes
    char '\''
    lookAhead $ noneOf ['|']
    ret <- quotedLabel2
    return $ lttrace "quotedLabel3" $ "'" ++ ret
tohava
  • 5,344
  • 1
  • 25
  • 47