42

I am curious about the runtime performance of an infinite list like the one below:

fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

This will create an infinite list of the fibonacci sequence.

My question is that if I do the following:

takeWhile (<5) fibs

how many times does fibs evaluate each term in the list? It seems that since takeWhile checks the predicate function for each item in the list, the fibs list will evaluate each term multiple times. The first 2 terms are given for free. When takeWhile wants to evaluate (<5) on the 3rd element, we will get:

1 : 1 : zipWith (+) [(1, 1), (1)] => 1 : 1 : 3

Now, once takeWhile wants to evaluate (<5) on the 4th element: the recursive nature of fibs will construct the list again like the following:

1 : 1 : zipWith (+) [(1, 2), (2, 3)] => 1 : 1 : 3 : 5

It would seem that the 3rd element needs to be computed again when we want to evaluate the value of the 4th element. Furthermore, if the predicate in takeWhile is large, it would indicate the function is doing more work that is needed since it is evaluating each preceding element in the list multiple times. Is my analysis here correct or is Haskell doing some caching to prevent multiple evaluations here?

Jonathan Sterling
  • 18,320
  • 12
  • 67
  • 79
akg
  • 523
  • 4
  • 5
  • 8
    What happened to `2`, which normally appears between `1` and `3` in the standard Fibonacci sequence? – Neil Jun 10 '12 at 21:20
  • 2
    Depends on the implementation of course, but a decent one will evaluate every element of the list only once. – Daniel Fischer Jun 10 '12 at 21:30

4 Answers4

81

This is a self-referential, lazy data structure, where "later" parts of the structure refer to earlier parts by name.

Initially, the structure is just a computation with unevaluated pointers back to itself. As it is unfolded, values are created in the structure. Later references to already-computed parts of the structure are able to find the value already there waiting for them. No need to re-evaluate the pieces, and no extra work to do!

The structure in memory begins as just an unevaluated pointer. Once we look at the first value, it looks like this:

> take 2 fibs

enter image description here

(a pointer to a cons cell, pointing at '1', and a tail holding the second '1', and a pointer to a function that holds references back to fibs, and the tail of fibs.

Evaluating one more step expands the structure, and slides the references along:

enter image description here

And so we go unfolding the structure, each time yielding a new unevaluated tail, which is a closure holding references back to 1st and 2nd elements of the last step. This process can continue infinitely :)

And because we're referring to prior values by name, GHC happily retains them in memory for us, so each item is evaluated only once.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • 7
    Perfect use of vacuum; really shines here. – Christopher Done Jun 10 '12 at 22:29
  • 1
    Really awesome answer, thanks for putting the time into that. – Jonathan Sterling Jun 10 '12 at 22:29
  • I'll add that once you don't need an initial part of the structure anymore, the garbage collector will notice that and automatically release the memory for re-use. So even though memory is used in this way to avoid re-calculation, only the last two items will be retained in memory at any time for a single iteration over many elements of `fibs`. – Yitz Jun 12 '12 at 09:20
  • How is the scope of this closure determined exactly? I'm confused about the case of slow fibonacci, where slow_fib n = slow_fib(n-1) + slow_fib(n-2), how come the values previously computed aren't stored. Is it the difference between function scope and global scope? – Anil Vaitla Feb 27 '13 at 19:34
31

Illustration:

module TraceFibs where

import Debug.Trace

fibs :: [Integer]
fibs = 0 : 1 : zipWith tadd fibs (tail fibs)
  where
    tadd x y = let s = x+y
               in trace ("Adding " ++ show x ++ " and " ++ show y
                                   ++ "to obtain " ++ show s)
                        s

Which produces

*TraceFibs> fibs !! 5
Adding 0 and 1 to obtain 1
Adding 1 and 1 to obtain 2
Adding 1 and 2 to obtain 3
Adding 2 and 3 to obtain 5
5
*TraceFibs> fibs !! 5
5
*TraceFibs> fibs !! 6
Adding 3 and 5 to obtain 8
8
*TraceFibs> fibs !! 16
Adding 5 and 8 to obtain 13
Adding 8 and 13 to obtain 21
Adding 13 and 21 to obtain 34
Adding 21 and 34 to obtain 55
Adding 34 and 55 to obtain 89
Adding 55 and 89 to obtain 144
Adding 89 and 144 to obtain 233
Adding 144 and 233 to obtain 377
Adding 233 and 377 to obtain 610
Adding 377 and 610 to obtain 987
987
*TraceFibs>
danr
  • 2,405
  • 23
  • 24
Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
19

When something is evaluated in Haskell, it stays evaluated, as long as it's referenced by the same name1.

In the following code, the list l is only evaluated once (which might be obvious):

let l = [1..10]
print l
print l -- None of the elements of the list are recomputed

Even if something is partially evaluated, that part stays evaluated:

let l = [1..10]
print $ take 5 l -- Evaluates l to [1, 2, 3, 4, 5, _]
print l          -- 1 to 5 is already evaluated; only evaluates 6..10

In your example, when an element of the fibs list is evaluated, it stays evaluated. Since the arguments to zipWith reference the actual fibs list, it means that the zipping expression will use the already partially computed fibs list when computing the next elements in the list. This means that no element is evaluated twice.

1This is of course not strictly required by the language semantics, but in practice this is always the case.

dflemstr
  • 25,947
  • 5
  • 70
  • 105
4

Think of it this way. The variable fib is a pointer to a lazy value. (You can think of a lazy value underneath as a data structure like (not real syntax) Lazy a = IORef (Unevaluated (IO a) | Evaluated a); i.e. it starts out as unevaluated with a thunk; then when it is evaluated it "changes" to something that remembers the value.) Because the recursive expression uses the variable fib, they have a pointer to the same lazy value (they "share" the data structure). The first time someone evaluates fib, it runs the thunk to get the value and that value is remembered. And because the recursive expression points to the same lazy data structure, when they evaluate it, they will see the evaluated value already. As they traverse the lazy "infinite list", there will only be one "partial list" in memory; zipWith will have two pointers to "lists" which are simply pointers to previous members of the same "list", due to the fact that it started with pointers to the same list.

Note that this is not really "memoizing"; it's just a consequence of referring to the same variable. There is generally no "memoizing" of function results (the following will be inefficient):

fibs () = 0 : 1 : zipWith tadd (fibs ()) (tail (fibs ()))
newacct
  • 119,665
  • 29
  • 163
  • 224