Problems
Your code isn't self-contained and the actual problem is unclear. However, I suspect your woes are actually caused by how keys are parsed; in particular, something like \r\nk
is a valid key, according to your parser:
λ> parseOnly parsePair "\r\nk: v\r\n"
Right ("\r\nk","v")
That needs to be fixed.
Moreover, since one EOL separates (rather than terminates) key-value pairs, an EOL shouldn't be consumed at the end of your parsePair
parser.
Another tangential issue: because you use the many1
combinator instead ByteString
-oriented parsers (such as takeTill
), your values have type String
instead of ByteString
. That's probably not what you want, here, because it defeats the purpose of using ByteString
in the first place.; see Performance considerations.
Solution
I suggest the following refactoring:
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString ( ByteString )
import Data.Attoparsec.ByteString.Char8 ( Parser
, count
, endOfLine
, parseOnly
, sepBy
, string
, takeTill
)
-- convenient type synonyms
type KVPair = (ByteString, ByteString)
type Msg = [KVPair]
pair :: Parser KVPair
pair = do
k <- key
_ <- string ": "
v <- value
return (k, v)
where
key = takeTill (\c -> c == ':' || isEOL c)
value = takeTill isEOL
isEOL c = c == '\n' || c == '\r'
-- one EOL separates key-value pairs
msg :: Parser Msg
msg = sepBy pair endOfLine
-- two EOLs separate messages
msgs :: Parser [Msg]
msgs = sepBy msg (count 2 endOfLine)
I have renamed your parsers, for consistency with attoparsec
's, none of which have "parse" as a prefix:
parsePair
--> pair
parseListPairs
--> msg
parseMsg
--> msgs
Tests in GHCi
λ> parseOnly keyValuePair "\r\nk: v"
Left "string"
Good; you do want a fail, in this case.
λ> parseOnly keyValuePair "k: v"
Right ("k","v")
λ> parseOnly msg "k: v\r\nk2: v2\r\n"
Right [("k","v"),("k2","v2")]
λ> parseOnly msgs "k1: v1\r\nk2: v2\r\n\r\nk3: v3\r\nk4: v4"
Right [[("k1","v1"),("k2","v2")],[("k3","v3"),("k4","v4")]]
λ> parseOnly msgs "k: v"
Right [[("k","v")]]