2

I'd like to use Conduit in a setting where I read a binary file, check that it has the correct header, and then work on the remaining data in the file.

Trying to write a conduit that checks the header and then streams the rest of the data on to the following conduits I run into trouble. I have them live in a Either String monad for some exception handling. Here's a simplified version of the code (I'm aware there's a Condiut.Attoparsec module, but for now I'd like to write it myself):

import Conduit (ConduitM, mapC, mapM_C, takeWhileCE) 
import Data.ByteString.Conversion (toByteString')

separator :: ByteString
separator = toByteString' '#' 

check :: ByteString -> Either String ()

confirmHeader :: ConduitM ByteString ByteString (Either String) ()
confirmHeader = do
  takeWhileC (/= separator) .| mapM_C check
  mapC id

separator is a predefined ByteString that signals the end of the header. The line mapC id is supposed to pass on the rest of the stream if the header checks out. I left out the nonimportant details of check.

The part checking the header works. The last line, however, apart from looking inelegant and non-idiomatic, doesn't work. Running something like

runConduit $ yield (toByteString' "header#rest") .| confirmHeader .| sinkList

Gives Right [] rather than Right ["rest"], as I had hoped. Any ideas?

jorgen
  • 3,425
  • 4
  • 31
  • 53

1 Answers1

1

Your takeWhileC (/= separator) is taking the whole ByteString: it's not working on chunks of ByteStrings! You can use Data.Conduit.Binary to work on individual bytes of the stream. The below code works "as expected" I believe.

module Main (main) where

import           Conduit
import           Data.ByteString (ByteString)
import           Data.ByteString.Conversion (toByteString')
import           Data.Char (ord)
import qualified Data.Conduit.Binary as B
import           GHC.Word (Word8)

separator :: Word8
separator = toEnum $ ord '#'

check :: ByteString -> Either String ()
check _ = Right ()

confirmHeader :: ConduitM ByteString ByteString (Either String) ()
confirmHeader = do
  B.takeWhile (/= separator) .| mapM_C check
  B.drop 1 -- drop separator which stayed in stream
  mapC id

main :: IO ()
main = print . runConduit $
  yield (toByteString' "header#rest") .| confirmHeader .| sinkList

And the output:

[nix-shell:/tmp]$ ghc C.hs -fforce-recomp -Wall -Werror -o Main && ./Main
[1 of 1] Compiling Main             ( C.hs, C.o )
Linking Main ...
Right ["rest"]
Mateusz Kowalczyk
  • 2,036
  • 1
  • 15
  • 29
  • Great! As a small detail I opted for takeWhileCE instead for now, but I'll benchmark it against Conduit.Binary. – jorgen Oct 28 '17 at 01:13
  • `takeWhileCE` should work: I actually wanted to use that but I forgot its name and Conduit.Binary came up first on hackage. – Mateusz Kowalczyk Oct 28 '17 at 13:44
  • Ok! Incidentally, can you think of something more elegant than `mapC id`, or is that standard? Apologies if this is now off topic.. – jorgen Oct 28 '17 at 13:51
  • 1
    I think `mapC id` is canonical: you can see conduit author use it here https://www.snoyman.com/blog/2017/05/worst-function-in-conduit – Mateusz Kowalczyk Oct 28 '17 at 14:06