6

I have the following code.

main = print $ sum [1..1000000]

When I run I get a stack overflow:

Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.

I'm accustom to imperative languages like Python which seem to have no problem with such a calculation:

sum(range(100000000))  # I'm not even using a generator.
4999999950000000

Haskell is obviously different, but I don't quite understand what's happening to cause the stack overflow? What's going on under the hood to cause the stack overflow in Haskell?

user2864740
  • 60,010
  • 15
  • 145
  • 220
Buttons840
  • 9,239
  • 15
  • 58
  • 85

2 Answers2

10

This entire question is only relevant for GHC<7.10. In recent versions, sum [1..1000000] works just fine in constant space, at least on built-in number types.


sum isused to be implemented with the evil foldl1, which isn't as strict as it should be. Thus, what you get from sum is essentially a pile of thunks, as large as your input. I think there was a discussion about why it is done this way here at some point... IMO it's basically just stupid, since sums can't normally be consumed lazily anyway it's just obvious to use a strict fold.

Prelude> :m +Data.List
Prelude Data.List> foldl' (+) 0 [1..1000000]
500000500000


1Actually, foldl is only used in the report version... but the explicit-recursion version with accumulator is of course no better.

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • 2
    Is there any reason they haven't fixed this? – Gabriella Gonzalez Jan 29 '14 at 00:26
  • Yeah, strange, isn't it? There's sure [a proposal](https://ghc.haskell.org/trac/haskell-prime/ticket/120), but nobody seems to have been bothering for a few years... perhaps there are some stream-fusion arguments, but I can't really see how that could apply. – leftaroundabout Jan 29 '14 at 00:31
  • 1
    @leftaroundabout Isn't `foldl` notorious for being *bad* at stream fusion, it's not a good consumer. In particular there was recently a discussion on ghc-devs about revamping the frameworking for fusion to make it a good consumer. TLDR `foldl` is just the worst – daniel gratzer Jan 29 '14 at 01:22
  • 2
    So if I remember correctly, the only reason people haven't bothered yet is that the problem goes away when you compile with `-O2` (however, I still think it should be fixed). Also, jozefg is right and both `foldl`/`foldl'` are bad consumers for stream fusion. The correct version is `Data.Foldable.foldl'` which should be the official one. – Gabriella Gonzalez Jan 29 '14 at 02:22
  • 2
    I suspect the reason it hasn't been changed is that the Haskell Report specifies the lazy behaviour. – Tom Ellis Jan 29 '14 at 09:00
  • @TomEllis perhaps, but at least all the standard number types can't be evaluated lazily anyway, so there's nothing to be won with `foldl` really. `base-4.8` has thus fixed the issue, so this question is obsolete now. – leftaroundabout Dec 05 '17 at 09:53
3

sum is defined in terms of foldl which is lazy in a left associative sort of way so that it has to generate thunks for the whole list before evaluating a single (in this case addition) expression.

You could also define sum in terms of foldls stricter counterpart foldl' like so:

sum' = foldl' (+) 0

See Foldr. Foldl. Foldl'. from the Haskell Wiki for a good explanation of how foldl has to generate thunks for every calculation without being able to evaluate anything, which will cause a Stack Overflow.

DJG
  • 6,413
  • 4
  • 30
  • 51
  • 3
    Saying `foldl` is lazy is a bit misleading, since unlike `foldr` it always needs to accumulate through the whole list. It basically picks just the bad aspects of non-strict semantics (thunk buildup), without yielding much of the useful properties of lazy code (GC-as-you-go etc.). – leftaroundabout Jan 29 '14 at 00:15
  • Clarified my answer a bit. – DJG Jan 29 '14 at 00:28