Constructing RequestBodyStream from Lazy ByteString when length is known

Question

I am trying to adapt this AWS S3 upload code to handle Lazy ByteString where length is already known (so that it is not forced to be read in its entirety in memory - it comes over the network where length is sent beforehand). It seems I have to define a GivesPopper function over Lazy ByteString to convert it to RequestBodyStream. Because of the convoluted way GivesPopper is defined, I am not sure how to write it for Lazy ByteString. Will appreciate pointers on how to write it. Here is how it is written for reading from the file:

let file ="test"
-- streams large file content, without buffering more than 10k in memory
let streamer sink = withFile file ReadMode $ \h -> sink $ S.hGet h 10240

streamer in the code above is of type GivesPopper () if I understand it correctly. Given a Lazy ByteString with known length len, what would be a good way to write GivesPopper function over it? We can read one chunk at a time.

score 2 · Accepted Answer · edited Jun 04 '16 at 14:25

Is this what you're looking for?

import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import System.IO

file = "test"
-- original streamer for feeding a sink from a file
streamer :: (IO S.ByteString -> IO r) -> IO r
streamer sink = withFile file ReadMode $ \h -> sink $ S.hGet h 10240

-- feed a lazy ByteString to sink    
lstreamer :: L.ByteString -> (IO S.ByteString -> IO r) -> IO r
lstreamer lbs sink = sink (return (L.toStrict lbs))

lstreamer type checks but probably doesn't do exactly what you want it to do. It simply returns the same data every time the sink calls it. On the other hand S.hGet h ... will eventually return the empty string.

Here is a solution which uses an IORef to keep track of if we should start returning the empty string:

import Data.IORef

mklstream :: L.ByteString -> (IO S.ByteString -> IO r) -> IO r
mklstream lbs sink = do
  ref <- newIORef False
  let fetch :: IO S.ByteString
      fetch = do sent <- readIORef ref
                 writeIORef ref True
                 if sent
                   then return S.empty
                   else return (L.toStrict lbs)
  sink fetch

Here fetch is the action which gets the next chunk. The first time you call it you will get the original lazy Bytestring (strict-ified). Subsequent calls will always return the empty string.

Update

Here's how to give out a small amount at a time:

mklstream :: L.ByteString -> (IO S.ByteString -> IO r) -> IO r
mklstream lbs sink = do
  ref <- newIORef (L.toChunks lbs)
  let fetch :: IO S.ByteString
      fetch = do chunks <- readIORef ref
                 case chunks of
                   [] -> return S.empty
                   (c:cs) -> do writeIORef ref cs
                                return c
  sink fetch

Isn't this solution going to read the entire lazy bytestring into memory because of `L.toStrict` call? We want to read one chunk at a time, or place an upper bound on memory usage. — Sal, Jun 04 '16 at 01:59
Well - it shows you how to create some state (an IORef) to keep track of what you've already given out. Note that unless you're using lazy-IO they lazy byte string is already in memory. — ErikR, Jun 04 '16 at 02:39
It should probably be said that the updated `mklstream` relies on the invariant that a lazy bytestring never has an empty bytestring chunk. It would be clearer that it does what `http-client` wants if the ref were defined with `newIORef (filter (not . S.null) (L.toChunks lbs))`, but it doesn't need to do this, if I understand. Then it has the desired structure, which is to be like [`hGet`](http://hackage.haskell.org/package/bytestring-0.10.8.1/docs/Data-ByteString.html#v:hGet) — Michael, Jun 04 '16 at 07:03
The same result comes more or less prepackaged with `io-streams`. There you would write `ref <- Streams.fromLazyByteString lbs` and then the `fetch` function would be something like `fmap (maybe S.empty id) (Streams.read ref)` — Michael, Jun 04 '16 at 07:16
@Michael, yep, good point about `io-streams`. I took a look at the implementation there. Wondering if `writeIORef` in the code above is efficient because the way it pops top of the stack is by rewriting the new stack after removing the top element. `io-streams` too has similar logic in `makeInputStream`. So, it is likely to be efficient, I guess. — Sal, Jun 04 '16 at 08:09

Constructing RequestBodyStream from Lazy ByteString when length is known

1 Answers1

Linked