Supposing I have a module like this:
module Explosion where
import Pipes.Parse (foldAll, Parser, Producer)
import Pipes.ByteString (ByteString, fromLazy)
import Pipes.Aeson (DecodingError)
import Pipes.Aeson.Unchecked (decoded)
import Data.List (intercalate)
import Data.ByteString.Lazy.Char8 (pack)
import Lens.Family (view)
import Lens.Family.State.Strict (zoom)
produceString :: Producer ByteString IO ()
produceString = fromLazy $ pack $ intercalate " " $ map show [1..1000000]
produceInts ::
Producer Int IO (Either (DecodingError, Producer ByteString IO ()) ())
produceInts = view decoded produceString
produceInts' :: Producer Int IO ()
produceInts' = produceInts >> return ()
parseBiggest :: Parser ByteString IO Int
parseBiggest = zoom decoded (foldAll max 0 id)
The 'produceString' function is a bytestring producer, and I am concerned with folding a parse over it to produce some kind of result.
The following two programs show different ways of tackling the problem of finding the maximum value in the bytestring by parsing it as a series of JSON ints.
Program 1:
module Main where
import Explosion (produceInts')
import Pipes.Prelude (fold)
main :: IO ()
main = do
biggest <- fold max 0 id produceInts'
print $ show biggest
Program 2:
module Main where
import Explosion (parseBiggest, produceString)
import Pipes.Parse (evalStateT)
main :: IO ()
main = do
biggest <- evalStateT parseBiggest produceString
print $ show biggest
Unfortunately, both programs eat about 200MB of memory total when I profile them, a problem I'd hoped the use of streaming parsers would solve. The first program spends most of its time and memory (> 70%) in (^.)
from Lens.Family, while the second spends it in fmap
, called by zoom
from Lens.Family.State.Strict. The usage graphs are below. Both programs spend about 70% of their time doing garbage collection.
Am I doing something wrong? Is the Prelude function max
not strict enough? I can't tell if the library functions are bad, or if I'm using the library wrong! (It's probably the latter.)
For completeness, here's a git repo that you can clone and run cabal install
in if you'd like to see what I'm talking about first-hand, and here's the memory usage of the two programs: