Can a thunk be duplicated to improve memory performance?

Question

One of my struggles with lazy evaluation in Haskell is the difficulty of reasoning about memory usage. I think the ability to duplicate a thunk would make this much easier for me. Here's an example.

Let's create a really big list:

let xs = [1..10000000]

Now, let's create a bad function:

bad = do
    print $ foldl1' (+) xs
    print $ length xs

With no optimizations, this eats up a few dozen MB of ram. The garbage collector can't deallocate xs during the fold because it will be needed for calculating the length later.

Is it possible to reimplement this function something like this:

good = do
    (xs1,xs2) <- copyThunk xs
    print $ foldl1' (+) xs1
    print $ length xs2

Now, xs1 and xs2 would represent the same value, but also be independent of each other in memory so the garbage collector can deallocate during the fold preventing memory wasting. (I think this would slightly increase the computational cost though?)

Obviously in this trivial example, refactoring the code could easily solve this problem, but It seems like it's not always obvious how to refactor. Or sometimes refactoring would greatly reduce code clarity.

If you make the list polymorphic, `let xs :: (Num a, Enum a) => [a]; xs = [1 .. 10000000]`, it's unlikely to be shared. The `xs () = [1 .. 10000000]` trick basically does the same (since it's a function binding, `xs` is polymorphic), but if you give the function a monomorphic signature, the list is likely to be shared (GHC does, at least with optimisations). In both cases, the compiler _could_ share the list because by defaulting it is used at the same type in both places, but so far, GHC doesn't do that sort of analysis. — Daniel Fischer, Jul 26 '12 at 20:17

score 20 · Accepted Answer · edited Jul 26 '12 at 20:22

20

I was wondering the same thing a while ago and created a prototypical implementation of such a thunk-duplication function. You can read about the result in my preprint „dup – Explicit un-sharing in haskell” and see the code at http://darcs.nomeata.de/ghc-dup. Unfortunately, the paper was neither accepted for the Haskell Symposium nor the Haskell Implementors Workshop this year.

To my knowledge, there is no real-world-ready solution to the problem; only fragile work-arounds as the unit parameter trick that might break due to one or the other compiler optimizations.

edited Jul 26 '12 at 20:22

Daniel Wagner

145,880
9
220
380

answered Jul 26 '12 at 20:05

Joachim Breitner

25,395
6
78
139

This is exactly what I wanted. Are you planning to put this on hackage? – Mike Izbicki Jul 26 '12 at 21:34
It is still just a prototype and not proven in real application – have you read the section on the limitations of the implementation in the paper? Maybe you can try out the code from the repository and tell me how well it works for you; if it turns out to be good then I can upload to hackage. (Installation works better with ./Setup than with cabal, so you should do `darcs get http://darcs.nomeata.de/ghc-dup && cd ghc-dup && ghc --make Setup.hs && ./Setup configure --user && ./Setup build && ./Setup install` – Joachim Breitner Jul 26 '12 at 21:58
Thanks. I'll give it a shot and report back. – Mike Izbicki Jul 26 '12 at 22:41

score 4 · Answer 2 · answered Jul 26 '12 at 19:25

4

Interesting question. I don't know how to implement copyThunk. But there is something else you can do (sorry if you already knew this):

xsFunction :: () -> [Int]
xsFunction = const [1..10000000]

better = do
  print $ foldl1' (+) $ xsFunction ()
  print $ length $ xsFunction ()

Here it definitely won't put the expression xsFunction () in a thunk, it will be calculated twice thus not making any memory bloat.

An interesting follow up on this is:

Can one ever implement copyThunk?
Should a haskell programmer ever be messing around with this relatively low level optimizations? Can't we assume ghc to outsmart us on this?

answered Jul 26 '12 at 19:25

Tarrasch

10,199
6
41
57

1

Unfortunately, your approach might not work as expected as the compiler is likely to float the `[1..1000000]` out, hence sharing it again. Read more about this in the paper linked in my answer and in this GHC ticket: http://hackage.haskell.org/trac/ghc/ticket/917 – Joachim Breitner Jul 26 '12 at 20:06
1

With the monomorphic type, the list is indeed shared when compiling with optimisations. – Daniel Fischer Jul 26 '12 at 20:08
Would disabling optimizations prevent this from happening? For example, could I simply have a library compiled without optimizations that moves an expression into a function like this, and get "effective thunk duplication"? – Mike Izbicki Jul 26 '12 at 20:14
2

@MikeIzbicki Without optimisations, GHC currently doesn't share `xs :: () -> [Int]`. But a) you can't rely on that remaining so and b) unoptimised code is usually very slow. – Daniel Fischer Jul 26 '12 at 20:25

score 2 · Answer 3 · answered Jul 26 '12 at 19:25

2

Turn xs into a function. This may be ugly, but works, because it prevents sharing:

let xs () = [1..1000000]

good = do
    print $ foldl1' (+) (xs ())
    print $ length (xs ())

answered Jul 26 '12 at 19:25

ertes

29
1

2

Caution. With a monomorphic type for `xs`, ghc does share the list when compiled with optimisations. – Daniel Fischer Jul 26 '12 at 20:05

Can a thunk be duplicated to improve memory performance?

3 Answers3

Linked