3

Thinking Functionally with Haskell provides the following code for calculating the mean of a list of Float's.

mean :: [Float] -> Float
mean [] = 0
mean xs = sum xs / fromIntegral (length xs)

Prof. Richard Bird comments:

Now we are ready to see what is really wrong with mean: it has a space leak. Evaluating mean [1..1000] will cause the list to be expanded and retained in memory after summing because there is a second pointer to it, namely in the computation of its length.

If I understand this text correctly, he's saying that, if there was no pointer to xs in the length computation, then the xs memory could've been freed after calculating the sum?

My confusion is - if the xs is already in memory, isn't the length function simply going to use the same memory that's already being taken up?

I don't understand the space leak here.

Kevin Meredith
  • 41,036
  • 63
  • 209
  • 384
  • Hmm, "leak" to me seems a little odd... But definitely one thing to note: http://stackoverflow.com/a/2380437/667648 sum xs is calculated, then length xs is calculated, making two passes through instead of one. – Dair Mar 05 '15 at 02:04
  • 1
    Not really related to the question but if you're ever going to write a mean function, please please please have it return a `Maybe`. There's no way of distinguishing a real 0 from a fake one here. – hdgarrood Mar 05 '15 at 02:21
  • 1
    A Haskell koan: Computing `sum xs` is not a problem. Computing `length xs` is not a problem. Computing `sum xs / length xs` is a problem. – Daniel Wagner Mar 05 '15 at 02:38
  • @DanielWagner I fail to understand this. Given that `sum` is defined lazily in terms of `foldl` won't it cause space leak? Giving `sum [1..10000000]` in my laptop increases my laptop's memory usage like anything. Whereas, `mysum = foldl' (+) 0` runs on constant memory space ? – Sibi Mar 05 '15 at 02:45
  • @Sibi That is indeed an unfortunate historical detail. But it really detracts from the beauty of the koan to substitute `foldl' (+) 0` for `sum` everywhere while adding little of interest to the novice, don't you agree? =) – Daniel Wagner Mar 05 '15 at 02:51

4 Answers4

11

The sum function does not need to keep the entire list in memory; it can look at an element at a time then forget it as it moves to the next element.

Because Haskell has lazy evaluation by default, if you have a function that creates a list, sum could consume it without the whole list ever being in memory (each time a new element is generated by the producing function, it would be consumed by sum then released).

The exact same thing happens with length.

On the other hand, the mean function feeds the list to both sum and length. So during the evaluation of sum, we need to keep the list in memory so it can be processed by length later.

[Update] to be clear, the list will be garbage collected eventually. The problem is that it stays longer than needed. In such a simple case it is not a problem, but in more complex functions that operate on infinite streams, this would most likely cause a memory leak.

MasterMastic
  • 20,711
  • 12
  • 68
  • 90
Frédéric Dumont
  • 958
  • 13
  • 19
  • `sum` is implemented lazily in Prelude. I think it will still keep the entire list in memory ? – Sibi Mar 05 '15 at 02:19
  • Yes, because it's not `sum` that keeps the list in memory; it's `mean` which needs it for both `sum` then `length`. – Frédéric Dumont Mar 05 '15 at 02:24
  • It is a problem in this case, because that allocation is much more expensive than computing the list as a loop twice, as it would be if you inlined both occurrences – luqui Mar 05 '15 at 02:25
  • @FrédéricDumont, Sorry, I think I was not being clear. There can be two ways in which the entire list can be in memory: From the `xs` which is producing lazily and the lazy `foldl` operation which will expand the entire list with (+) operation on it. Bird is probably referring to first type of condition. Although, even using normal `sum` function will lead to a space leak. – Sibi Mar 05 '15 at 02:42
  • @Sibi yes, clearly Bird is referring to the first type. And it boggles the mind that `sum` is still implemented with a non-strict accumulator, although it's possible that it's not a problem in practice when strictness analysis kicks in. – Frédéric Dumont Mar 05 '15 at 02:54
4

Others have explained what the problem is. The cleanest solution is probably to use Gabriel Gonzalez's foldl package. Specifically, you'll want to use

import qualified Control.Foldl as L
import Control.Foldl (Fold)
import Control.Applicative

meanFold :: Fractional n => Fold n (Maybe n)
meanFold = f <$> L.sum <*> L.genericLength where
  f _ 0 = Nothing
  f s l = Just (s/l)

mean :: (Fractional n, Foldable f) => f n -> Maybe n
mean = L.fold meanFold
dfeuer
  • 48,079
  • 5
  • 63
  • 167
3

if there was no pointer to xs in the length computation, then the xs memory could've been freed after calculating the sum?

No, you're missing the important aspect of lazy evaluation here. You're right that length will use the same memory as was allocated during the sum call, the memory in which we had expanded the whole list.

But the point here is that allocating memory for the whole list shouldn't be necessary at all. If there was no length computation but only the sum, then memory could've been freed during calculating the sum. Notice that the list [1..1000] is lazily generated only when it is consumed, so in fact the mean [1..1000] should run in constant space.

You might write the function like the following, to get an idea of how to avoid such a space leak:

import Control.Arrow

mean [] = 0
mean xs = uncurry (/) $ foldr (\x -> (x+) *** (1+)) (0, 0) xs

-- or more verbosely
mean xs = let (sum, len) = foldr (\x (s, l) -> (x+s, 1+l)) (0, 0)
          in sum / len

which should traverse xs only once. However, Haskell is damn lazy - and computes the first tuple components only when evaluating sum and the second ones only later for len. We need to use some more tricks to actually force the evaluation:

{-# LANGUAGE BangPatterns #-}
import Data.List

mean [] = 0
mean xs = uncurry (/) $ foldl' (\(!s, !l) x -> (x+s, 1+l)) (0,0) xs

which really runs in constant space, as you can confirm in ghci by using :set +s.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
1

The space leak is that the entire evaluated xs is held in memory for the length function. This is wasteful, as we aren't going to be using the actual values of the list after evaluating sum, nor do we need them all in memory at the same time, but Haskell doesn't know that.

A way to remove the space leak would be to recalculate the list each time:

sum [1..1000] / fromIntegral (length [1..1000])

Now the application can immediately start discarding values from the first list as it is evaluating sum, since it is not referenced anywhere else in the expression.

The same applies for length. The thunks it generates can be marked for deletion immediately, since nothing else could possibly want it evaluated further.

EDIT:

Implementation of sum in Prelude:

sum l = sum' l 0
  where
    sum' []     a = a
    sum' (x:xs) a = sum' xs (a+x)