4

I'm using aeson / attoparsec and conduit / conduit-http connected by conduit-attoparsec to parse JSON data from a file / webserver. My problem is that my pipeline always throws this exception...

ParseError {errorContexts = ["demandInput"], errorMessage = "not enough bytes", errorPosition = 1:1}

...once the socket closes or we hit EOF. Parsing and passing on the resulting data structures through the pipeline etc. works just fine, but it always ends with the sinkParser throwing this exception. I invoke it like this...

j <- CA.sinkParser json

...inside of my conduit that parses ByteStrings into my message structures.

How can I have it just exit the pipeline cleanly once there is no more data (no more top-level expressions)? Is there any decent way to detect / distinguish this exception without having to look at error strings?

Thanks!

EDIT: Example:

{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Applicative
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as B8
import qualified Data.Conduit.Attoparsec as CA
import Data.Aeson
import Data.Conduit
import Data.Conduit.Binary
import Control.Monad.IO.Class

data MyMessage = MyMessage String deriving (Show)

parseMessage :: (MonadIO m, MonadResource m) => Conduit B.ByteString m B.ByteString
parseMessage = do
    j <- CA.sinkParser json
    let msg = fromJSON j :: Result MyMessage
    yield $ case msg of
        Success r -> B8.pack $ show r
        Error   s -> error s
    parseMessage

main :: IO ()
main =
    runResourceT $ do
        sourceFile "./input.json" $$ parseMessage =$ sinkFile "./out.txt"

instance FromJSON MyMessage where
    parseJSON j =
        case j of
        (Object o) -> MyMessage <$> o .: "text"
        _          -> fail $ "Expected Object - " ++ show j

Sample input (input.json):

{"text":"abc"}
{"text":"123"}

Outputs:

out: ParseError {errorContexts = ["demandInput"], errorMessage = "not enough bytes", errorPosition = 3:1}

and out.txt:

MyMessage "abc"MyMessage "123"
NBFGRTW
  • 459
  • 3
  • 11

1 Answers1

4

This is a perfect use case for conduitParserEither:

parseMessage :: (MonadIO m, MonadResource m) => Conduit B.ByteString m B.ByteString
parseMessage =
    CA.conduitParserEither json =$= awaitForever go
  where
    go (Left s) = error $ show s
    go (Right (_, msg)) = yield $ B8.pack $ show msg ++ "\n"

If you're on FP Haskell Center, you can clone my solution into the IDE.

Michael Snoyman
  • 31,100
  • 3
  • 48
  • 77
  • Maybe I just have a misconception about attoparsec here, but even if I have a nice way to handle this exception, how can I correctly interpret it? In my case there is no actual error, everything is perfectly fine and I'd just like to stop parsing. But this error looks like it might also appear when the data is corrupt and the json data just ends in the middle. Does every attoparsec conduit end with an exception, and I just have to look at the error strings in the exception to figure out if it's an actual error or simply EOF? – NBFGRTW Oct 22 '13 at 16:01
  • The error message is accurate. What it's doing is parsing the first two snippets of JSON correctly, and for the third finding that there's no data available and therefore saying that there's a parse error. You may want to look at the implementation of conduitParserEither, which explicitly checks for EOF. – Michael Snoyman Oct 22 '13 at 19:37
  • Ok, thank you for the clarification! I think I'll go with your suggestion and use conduitParserEither to detect the "demandInput" error and re-throw everything else for the actual error handling code upstream. – NBFGRTW Oct 22 '13 at 20:26