1

Attoparsec has modules specialized for Strict/Lazy, ByteString/Text, Char8 (ascii)/Char. But it doesn't have all the combinations.

I think Data.Attoparsec.ByteString.Lazy.Char8 which isn't provided would be particularly convenient for grinding through large reports which tend to be encoded as ascii.

Do you know why it doesn't exist?

Michael Fox
  • 3,632
  • 1
  • 17
  • 27
  • Data.ByteString and Data.ByteString.Char8 use the same underlying type (same with the lazy version), so I am not sure it is needed. That being said, given that, I am not sure why there is a `Data.Attoparsec.ByteString.Char8` so I too am a bit confused.... – jamshidh Dec 01 '14 at 20:31
  • OK, I think I just answered my question.... the Char8 version has a bunch more parsing functions (`char`, for instance) that deal with ascii. Seems to be more for parsing text. So, now I agree, there should be a Lazy.Char8 also. – jamshidh Dec 01 '14 at 20:37
  • @jamshidh The `*.Char8` libraries allow you to use the parser combinators which consume and return `Char` types instead of `Word8` types - e.g. `anyChar` is a `Parser Char` whereas `anyWord8` is a `Parser Word8`. They both do the same thing and both operate on bytestrings, but the return types are different. The same goes for `Data.ByteString` versus `Data.ByteString.Char8` - the functions are the same but the input/output types are different. – ErikR Dec 01 '14 at 20:41
  • @user5402, @jamshidh The main differences to me are: 1. The `parse` function in the Lazy modules accepts strings from Data.ByteString.Lazy whereas the other ones want you to handle chunking of input by calling a continuation after each chunk is processed. So the Lazy ones are easier to use. 2. The Char8 parsers are faster because they don't have to worry about unicode. While I'm all for i8n there's cases where you know you have ASCII so why take the overhead? – Michael Fox Dec 01 '14 at 23:30

1 Answers1

1

I don't think this is needed, because the two modules don't appear to overlap with each other.

Data.Attoparsec.ByteString.Char8 provides extra parsers specifically for parsing ASCII data. These are just variations of their Word8 counterparts, and they use the same underlying monad, so you should be able to mix and match without issue.

Data.Attoparsec.ByteString.Lazy provides an alternative parse function that you can use to run a parser against a lazy bytestring. This isn't special in any way, it's just a wrapper around the strict version, iteratively pushing chunks of the lazy ByteString into your parser.

From what I can tell, there's no reason you shouldn't be able to just use both of them together. For example:

import Data.ByteString.Lazy
import qualified Data.Attoparsec.ByteString.Char8 as Char8
import qualified Data.Attoparsec.ByteString.Lazy as Lazy

myParser :: Char8.Parser T
myParser = -- use parsers from Char8 if you'd like

lazyParse :: Char8.Parser T -> ByteString -> Lazy.Result T
lazyParse p s = Lazy.parse p s -- parse a lazy ByteString

You use the combinators from Char8 to define your parser, and then you use the functions from Lazy to run it. So there's no need for a Data.Attoparsec.ByteString.Lazy.Char8.

DarthFennec
  • 2,650
  • 17
  • 24