4

This simple parser is expected to parse messages of the form

key: value\r\nkey: value\r\n\r\nkey: value\r\nkey: value\r\n\r\n

One EOL acts as a field separator, and double EOL acts as a message separator. It works perfectly fine when the EOL separator is \n but parseWith always returns fail when it is \r\n.

parsePair = do
  key <- B8.takeTill (==':')
  _ <- B8.char ':'
  _ <- B8.char ' '
  value <- B8.manyTill B8.anyChar endOfLine
  return (key, value)

parseListPairs = sepBy parsePair endOfLine <* endOfLine

parseMsg = sepBy parseListPairs endOfLine <* endOfLine
jub0bs
  • 60,866
  • 25
  • 183
  • 186
concept3d
  • 2,248
  • 12
  • 21
  • 1
    Where did you get `endOfLine` from, or how did you define it? What does `B8` stand for? I can make an educated guess, but still... You need to make the code in your question self-contained, by adding your import statements and custom definitions. The problem is probably that `endOfLine` is defined to parse `\n`, not `\r\n`. You will have to define your own `endOfLine` parser for `\r\n`. – jub0bs Sep 03 '15 at 06:46
  • @Jubobs Probably not. Chances are this is [attoparsec's `endOfLine`](https://hackage.haskell.org/package/attoparsec-0.13.0.1/docs/Data-Attoparsec-Text.html#v:endOfLine). – duplode Sep 03 '15 at 06:55
  • 1
    @duplode You're right, but the relevant link is probably [this one](http://hackage.haskell.org/package/attoparsec-0.13.0.1/docs/Data-Attoparsec-ByteString-Char8.html#g:13), as suggested by the name `B8`. – jub0bs Sep 03 '15 at 07:00
  • endOfLine is from [Data.Attoparsec.ByteString.Char8](http://hackage.haskell.org/package/attoparsec-0.13.0.1/docs/src/Data-Attoparsec-ByteString-Internal.html#endOfLine]%28Data.Attoparsec.ByteString.Char8%29) I checked the implementation and it handles both end lines. – concept3d Sep 03 '15 at 07:33
  • You need to clarify. You state that one EOL acts as a pair separator, and two EOLs act as a message separator, but your example (`key: value\r\nkey: value\r\n\r\n`) suggests that two EOLs act as a message *terminator*, instead. Separator would make more sense to me, but which is it? Separator, or terminator? – jub0bs Sep 04 '15 at 10:18

2 Answers2

4

I'm assuming you are using these imports:

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Attoparsec.ByteString.Char8 as B8
import Data.Attoparsec.ByteString.Char8

The problem is that endOfLine consumes the end of line, so perhaps you really want something like:

parseListPairs = B8.many1 parsePair <* endOfInput

For instance, this works:

ghci> parseOnly parseListPairs "k: v\r\nk2: v2\r\n"
Right [("k","v"),("k2","v2")]

Update:

For parsing multiple messages you can use:

parseListPairs = B8.manyTill parsePair endOfLine
parseMsgs = B8.manyTill parseListPairs endOfInput

ghci> test3 = parseOnly parseMsgs "k1: v1\r\nk2: v2\r\n\r\nk3: v3\r\nk4: v4\r\n\r\n"
Right [[("k1","v1"),("k2","v2")],[("k3","v3"),("k4","v4")]]
ErikR
  • 51,541
  • 9
  • 73
  • 124
2

Problems

Your code isn't self-contained and the actual problem is unclear. However, I suspect your woes are actually caused by how keys are parsed; in particular, something like \r\nk is a valid key, according to your parser:

λ> parseOnly parsePair "\r\nk: v\r\n"
Right ("\r\nk","v")

That needs to be fixed.

Moreover, since one EOL separates (rather than terminates) key-value pairs, an EOL shouldn't be consumed at the end of your parsePair parser.

Another tangential issue: because you use the many1 combinator instead ByteString-oriented parsers (such as takeTill), your values have type String instead of ByteString. That's probably not what you want, here, because it defeats the purpose of using ByteString in the first place.; see Performance considerations.

Solution

I suggest the following refactoring:

{-# LANGUAGE OverloadedStrings #-}

import Data.ByteString ( ByteString )

import Data.Attoparsec.ByteString.Char8 ( Parser
                                        , count
                                        , endOfLine
                                        , parseOnly
                                        , sepBy
                                        , string
                                        , takeTill
                                        )

-- convenient type synonyms
type KVPair = (ByteString, ByteString)
type Msg = [KVPair]

pair :: Parser KVPair
pair = do
    k <- key
    _ <- string ": "
    v <- value
    return (k, v)
  where
    key     = takeTill (\c -> c == ':' || isEOL c)
    value   = takeTill isEOL
    isEOL c = c == '\n' || c == '\r'

-- one EOL separates key-value pairs
msg :: Parser Msg
msg = sepBy pair endOfLine

-- two EOLs separate messages
msgs :: Parser [Msg]
msgs = sepBy msg (count 2 endOfLine)

I have renamed your parsers, for consistency with attoparsec's, none of which have "parse" as a prefix:

  • parsePair --> pair
  • parseListPairs --> msg
  • parseMsg --> msgs

Tests in GHCi

λ> parseOnly keyValuePair  "\r\nk: v"
Left "string"

Good; you do want a fail, in this case.

λ> parseOnly keyValuePair  "k: v"
Right ("k","v")

λ> parseOnly msg "k: v\r\nk2: v2\r\n"
Right [("k","v"),("k2","v2")]

λ> parseOnly msgs  "k1: v1\r\nk2: v2\r\n\r\nk3: v3\r\nk4: v4"
Right [[("k1","v1"),("k2","v2")],[("k3","v3"),("k4","v4")]]

λ> parseOnly msgs "k: v"
Right [[("k","v")]]
jub0bs
  • 60,866
  • 25
  • 183
  • 186
  • I understand the performance consideration, my initial implementation was using Bytestring `parsePair :: Parser (ByteString,ByteString)` and takeTill/takeWhile combinators, actually it was very similar to the one you provided, it also failed but I couldn't find the problem, and I posted a different version, now after I saw your impl I suspect my problem was in the definition of isEndOfLine – concept3d Sep 03 '15 at 08:59
  • I ended up with infinite loop, same result as my original implementation, I will try to figure out the problem. There is also a concern that I am reading from socket. this implementation worked fine with parseOnly the first time. – concept3d Sep 03 '15 at 09:24
  • @concept3d Just curious: have you made any progress on this? – jub0bs Sep 06 '15 at 11:24
  • this solution still goes into infinite loop when used with parseWith, similar to my original solution which I didn't post. I used the accepted answer temporarily, I'll get back to this soon but I have to finish more important things first. – concept3d Sep 08 '15 at 07:49
  • @concept3d Ok. I don't think I can investigate any further until I see your code that uses `parseWith`. Good luck. – jub0bs Sep 08 '15 at 08:16