4

I have a Scotty/WAI application and one of the endpoints sends a large Text output built from a list of elements. Here is the relevant code:

  import Data.Text.Lazy as L
  import Data.Text.Lazy.Encoding as E

  class (Show csv) => ToCSV csv where
    toCSV :: csv -> L.Text
    toCSV = pack . show

  instance (ToCSV c) => ToCSV [c] where
    toCSV []     = empty
    toCSV (c:cs) = toCSV c <> "\n" <> toCSV cs


  get "/api/transactions" $ accept "text/csv" $ do
    purp <- selectPurpose
    txs <- allEntries <$> inWeb (listTransactions purp)
    setHeader "Content-Type" "text/csv"
    raw $ E.encodeUtf8 $ toCSV txs

As I understand Scotty's documentation the output should be lazily built and sent over the wire without the need to build the whole text/bytestring in memory. However this is not the behaviour that I observe: when I call this endpoint the server starts to eat up memory and I infer it is building the whole string, before sending it in one go.

Am I missing something?

Edit 1:

I have written a doStream function that's supposed to send chunks of resulting BS one by one:

doStream :: Text -> W.StreamingBody   
doStream t build flush = do
  let bs = E.encodeUtf8 t
  mapM_ (\ chunk -> build (B.fromByteString chunk)) (BS.toChunks bs)
  flush

but actually it still builds the whole output in memory...

Edit 2:

Actually, streaming this way works fine. The server process still eats up a lot of memory though, which might be actually garbage collectible upon sending each chunk. I will try to analyze memory usage more deeply to see where this consumption comes from.

Edit 3:

I tried to limit heap to 2GB but this makes the process crash. Some memory is retained during the whole transformation process...

insitu
  • 4,488
  • 3
  • 25
  • 42
  • From [scotty's source code](https://github.com/scotty-web/scotty/blob/master/Web/Scotty/Action.hs) I can see I basically inlined the `text` function which I used previously... – insitu Jul 03 '15 at 07:27
  • @Jubobs: thanks for taking care of my mistakes ! – insitu Jul 03 '15 at 08:28
  • 2
    The real problem is that I use lazy `Text` as result of `toCSV`: this leads to an explosion of memory used, mostly in `Data.Text.Lazy.Internal.Fusion` module. When I replace lazy text with strict text, memory consumption drops from 2GB to 40MB... I lose streaming of output in the process though. I might regain it if I handle manually transforming the list of objects to a CSV output. – insitu Jul 03 '15 at 14:41
  • Maybe conduit (or pipes) could be useful here, with a conduit that does the CSV transformation on individual objects? I've had good experiences with large data transfers through a conduit/Warp server. Memory consumption was in the order of a couple of MB's, even with throughputs in the order of 1 GB/s – kdkeyser Jul 03 '15 at 17:43
  • Yes, makes sense. But this means having a lazy structure which implies that I should delay the CSV transformation at the individual record level and stream that. Actually I should make things in such a way that `toCSV as = map toCSV as` – insitu Jul 03 '15 at 20:23

1 Answers1

2

Take a look at the "stream" function in Web.Scotty.Trans. It is made for the purpose of having finer grained control over the size of the data that is generated before it is flushed to the socket.

You call it with a StreamingBody argument, which is in fact a function of the type (Builder -> IO ()) -> IO () -> IO ().

So you write a function:

doMyStreaming send flush =
...

in which you send and flush your data in pieces, and then call the stream function with doMyStreaming as argument instead of the call to "raw".

kdkeyser
  • 61
  • 1
  • Thanks. That's what I am trying to do (See my edited question) but it is still too eager. Maybe I need to do something to the headers in order to use chunked encoding output? – insitu Jul 03 '15 at 08:50
  • Chunked encoding should not be necessary. But I notice you only flush after you've mapped over all of the chunks and called build on them. I would try to call flush after every build step, by moving the flush within the map. – kdkeyser Jul 03 '15 at 09:13
  • Actually, doing flush once works because `send` function do send data: I can confirm this because I traced its execution (using simple `putStrLn`) and could see the output in the client. – insitu Jul 03 '15 at 09:29