0

How does one make their own streaming code? I was generating about 1,000,000,000 random pairs of war decks, and I wanted them to be lazy streamed into a foldl', but I got a space leak! Here is the relevant section of code:

main = do
    games <- replicateM 1000000000 $ deal <$> sDeck --Would be a trillion, but Int only goes so high
    let res = experiment Ace games --experiment is a foldl'
    print res --res is tiny

When I run it with -O2, it first starts freezing up my computer, and then the program dies and the computer comes back to life (and Google Chrome then has the resources it needs to yell at me for using up all its resources.)

Note: I tried unsafeInterleaveIO, and it didn't work.

Full code is at: http://lpaste.net/109977

PyRulez
  • 10,513
  • 10
  • 42
  • 87
  • 3
    Are you sure the fold is what is eating up all the resources? Have you tried profiling your code using a smaller number (so it doesn't crash) to find out exactly where the extra memory is being used? It very well could be the `replicateM`, for example, or somewhere else in your code. – bheklilr Aug 25 '14 at 13:51
  • @bheklilr Well, it says turn.igo is doing the most allocating, but profilling says nothing about garbage collection. – PyRulez Aug 25 '14 at 14:12
  • No time to profile your code but suggest try a strict version of `Data.Map` used in `igo`. – David Unric Aug 25 '14 at 14:50
  • I am not using `Data.Map` anywhere. `experiment` is using a strict intmap. – PyRulez Aug 25 '14 at 14:50

2 Answers2

4

replicateM doesn't do lazy streaming. If you need to stream results from monadic actions, you should use a library such as conduit or pipes.

Your example code could be written to support streaming with conduits like this:

import Data.Conduit
import qualified Data.Conduit.Combinators as C

main = do
    let games = C.replicateM 1000000 $ deal <$> sDeck
    res <- games $$ C.foldl step Ace
    -- where step is the function you want to fold with
    print res

The Data.Conduit.Combinators module is from the conduit-combinators package.

As a quick-and-dirty solution you could implement a streaming version of replicateM using lazy IO.

import System.IO.Unsafe

lazyReplicateIO :: Integer -> IO a -> IO [a] --Using Integer so I can make a trillion copies
lazyReplicateIO 0 _   = return []
lazyReplicateIO n act = do
    a <- act
    rest <- unsafeInterleaveIO $ lazyReplicateIO (n-1) act
    return $ a : rest

But I recommend using a proper streaming library.

PyRulez
  • 10,513
  • 10
  • 42
  • 87
shang
  • 24,642
  • 3
  • 58
  • 86
2

The equivalent pipes solution is:

import Pipes
import qualified Pipes.Prelude as Pipes

-- Assuming the following types
action :: IO A
acc    :: S
step   :: S -> A -> S
done   :: S -> B

main = do
    b <- Pipes.fold step acc done (Pipes.replicateM 1000000 action)
    print (b :: B)
Gabriella Gonzalez
  • 34,863
  • 3
  • 77
  • 135