Idiomatic prefetching in a streaming library

Question

I'm working with the streaming library but would accept an answer using pipes or conduit.

Say I have

import Streaming (Stream, Of)
import qualified Streaming.Prelude as S

streamChunks :: Int -> Stream (Of Thing) IO ()
streamChunks lastID = do
  flip fix 0 $ \go thingID ->
    unless (thingID > lastID) $ do
      thing <- highLatencyGet thingID
      S.yield thing
      go (thingID+1)

To reduce latency I'd like to fork highLatencyGet to retrieve the next Thing in parallel with processing the previous Thing in the consumer.

Obviously I could transform my function above creating a new MVar and forking the next batch before calling yield, etc.

But I want to know if there is an idiomatic (composable) way to do this, such that it could be packaged in a library and could be used on arbitrary IO Streams. Ideally we could configure the prefetch value as well, like:

prefetching :: Int -> Stream (Of a) IO () -> Stream (Of a) IO ()

How much control do you want over the total number of outstanding high-latency requests? Probably it's easy to implement `Stream (Of a) IO () -> Stream (Of (Async a)) IO ()`. — Daniel Wagner, May 24 '17 at 00:09

danidiaz · Accepted Answer · 2017-05-24T00:46:42.320

This solution uses pipes but it could be easily adapted to use streaming. To be precise, it requires the pipes, pipes-concurrency and async packages.

It doesn't work in a "direct" style. Instead of simply transforming the Producer, it also takes a "folding function" that consumes a Producer. This continuation-passing style is necessary for setting up and tearing down the concurrency mechanism.

import Pipes
import Pipes.Concurrent (spawn',bounded,fromInput,toOutput,atomically)
import Control.Concurrent.Async (Concurrently(..))
import Control.Exception (finally)

prefetching :: Int -> Producer a IO () -> (Producer a IO () -> IO r) -> IO r
prefetching bufsize source foldfunc = do
    (outbox,inbox,seal) <- spawn' (bounded bufsize)
    let cutcord effect = effect `finally` atomically seal
    runConcurrently $
        Concurrently (cutcord (runEffect (source >-> toOutput outbox)))
        *>
        Concurrently (cutcord (foldfunc (fromInput inbox)))

The output of the original producer is redirected to a bounded queue. At the same time, we apply the producer-folding function to a producer that reads from the queue.

Whenever each of the concurrent actions completes, we take care to promptly close the channel to avoid leaving the other side hanging.

Thanks, it does look like pipes-concurrency is made for this sort of task. This isn't the type I was hoping for though — jberryman, May 24 '17 at 15:32

Idiomatic prefetching in a streaming library

1 Answers1