3

Has there been any work on supporting vector-style recycling for general data in Haskell? For example running

main = do
  let ls = [1..1000000]
  print ls

with -p states that it allocates a total of 425mb. Running

main = do
  let ls = [1..1000000]
  print ls
  print (ls ++ [1])

with -p states that it allocates ~twice that, at 820mb.

I understand why this is happening, but I'm wondering why GHC does not perform this optimization. I suppose one reason is that it doesn't show up much in real code so the benefits are slight, if anything (but maybe a general way to recycle any inductive structure could give some benefit?). Usually people are told to use other data structures anyways (e.g. Data.Seq).

ps - I've seen links to the 'Recycle Your Arrays!' paper by Roman Leshchinskiy, however all links on the web are now dead (and I don't want to pay to read it via Springer)

edit:

When I use deepseq instead of print, I get similar but less extreme results. Without the append I get 80mb and with the append is 136mb. Not as extreme but it's still a bit of memory that, in theory, could be saved.

pdexter
  • 811
  • 8
  • 16
  • 3
    Haskell lists are implemented as singly linked lists. Two such lists can have different "beginnings" and share the same "end", but the converse is impossible. Accordingly, when computing `print (ls ++ [1])`, Haskell has no other choice but to create a copy of the whole list `ls` in memory, only to stick `[1]` at the end, and then print the result. – jub0bs Mar 25 '15 at 00:39
  • 1
    Yes, but in theory at least, it seems that an optimizing compiler could see that the original list is not needed anymore, and simplify modify the original. – kec Mar 25 '15 at 00:43
  • 3
    In this case, you're measuring the memory used by the print. Compare: `print ls` with `print ls >> print ls` with `print (ls++[1])` with what you have ( `print ls >> print (ls++[1])` ) . That said, the list isn't being recycled -- but the memory used by `ls` and `ls++[1]` is a small part of your measurement. – Robert M. Lefkowitz Mar 25 '15 at 04:03
  • @RobertM.Lefkowitz are you sure? I actually already tried this and if my memory serves me, even if I lifted `let ls' = ls ++ [1]` and called print on that twice, the memory only increased by maybe 5mb (so the size was truly double because of the append operation). I'll investigate further – pdexter Mar 25 '15 at 13:14
  • @kec: Actually it isn't that simple because Haskell allows references to be shared. Since `print` uses `IO` sharing can only be allowed if GHC knows exactly what `print` does. Otherwise it could get stuffed into a variable somewhere and accessed later. – Guvante Mar 25 '15 at 16:43
  • Are you compiling with optimizations turned on? `print` calls `show` so would need to build the array of characters in memory unless optimizations were available. I don't know if GHC can stream `putStrLn` or now (which would allow it to avoiding building). Also `ls` likely has to be saved in full form since it is used twice. Also note that there exist several packages in Hackage that would not allocate the list twice by using a different data structure than a list for storage. – Guvante Mar 25 '15 at 16:45
  • @Guvante yes, I'm compiling with optimizations. But wouldn't GHC know exactly what print does? Couldn't it tell that it wasn't shared? Also, yes it can avoid the building, see [my other question](http://stackoverflow.com/questions/29153480/how-does-the-ghc-garbage-collector-runtime-know-that-it-can-create-an-array-i). I'm aware of the hackage packages: my question is more along the lines of if there's been a general way to tackle this in the compiler, as apposed to offering simply other libraries. – pdexter Mar 25 '15 at 17:05
  • @pdexter: Is there a reason you are focusing on allocations? The example is fairly performant from what I can tell and doesn't use much RAM due to laziness being utilized (well the latter kind of does but that is due to call by need, inlining the list would avoid that). – Guvante Mar 25 '15 at 18:24
  • @Guvante sure, but wouldn't the optimization proposed make it even more performant? Allocations take time, e.g. copying a list take O(n). A side-question of mine in this is whether an optimization like this could provide benefits to larger, more complex programs. This is of course a silly example to try to over-optimize, so I'm trying to figure out if it would apply to larger problems. – pdexter Mar 25 '15 at 21:51
  • 1
    @pdexter: Unfortunately inlining the ++ is a non-starter due to the way Haskell does call-by-need. Allowing mutation of things is never allowed to provide lots of other benefits. Sometimes mutation is possible but nothing complex like that ++. This exact optimization is one the user will need to do unfortunately. – Guvante Mar 25 '15 at 21:56

0 Answers0