Memoization in Haskell

Question

def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)
can be sped up by memoization:
fib_memo = {}
def fib(n):
    if n < 2: return 1
    if not fib_memo.has_key(n):
        fib_memo[n] = fib(n-1) + fib(n-2)
    return fib_memo[n]
This implementation technique of memoization is used widely in many programming languages, but it can’t be applied directly to Haskell because Haskell is pure and we don’t want to introduce impurity just to memoize a function. Fortunately, it is possible to memoize a function without side effects thanks to Haskell’s nature of lazy evaluation.

The following memoize function takes a function of type Int -> a and returns a memoized version of the same function. The trick is to turn a function into a value because, in Haskell, functions are not memoized but values are.

Questions:

Aren't functions and values the same in functional programming?
How is caching not considered a side effect (in the context of pure functions)

You can sometimes have memoization for free by using corecursion (self-referencing data structures): `fibs = 1 : 1 : zipWith (+) fibs (tail fibs)`. — , Jul 04 '20 at 20:58
I find that page a bit confusing, to be honest. Instead of showing a "basic" form of memoization, it immediately goes to `fibMemo = fix (memoize . fib)` which is rather subtle, relying on how `fix` is defined. IMO, the first step to understand memoization is to understand how `f = \x -> let y = expensive 42 in x+y` and `f = let y = expensive 42 in \x -> x+y` differ operationally (in absence of optimizations). The former computes `expensive 42` at each call, the latter only once. — chi, Jul 04 '20 at 21:46
"side effect" only refers to _what_ a function does, not _how_ (fast (or slow)) it does it. its "main effect" is calculating and producing the return value; it "side effect" is whatever change in the state of the outside world (outside of the function) it had caused. the function itself is seen as a black box. — Will Ness, Sep 24 '20 at 12:35

luqui · Answer 1 · 2020-07-04T20:58:47.217

All functions are values but not all values are functions.

This is really about operational semantics, which are sometimes tricky to talk about in Haskell because Haskell is only defined in terms of its denotational semantics -- that is, what value an expression evaluates to, rather than how that evaluation happens. It's not a side-effect because the "stateful" nature of memoization is still hidden behind the abstraction of purity: while there is some internal state (represented in the partial graph-reduction of the program), there is no way for your program to observe that state in a way that would distinguish it from the non-memoized version. A subtlety here is that these memoization strategies are not actually required to memoize -- all that is guaranteed is the result they will give after some unspecified finite amount of time.

There is no requirement for a Haskell implementation to memoize anything -- it could use pure call-by-name, for example, which doesn't memoize values, instead recomputing everything. Here's an example of call-by-name.

let f x = x * x in f (2 + 2)
= (2 + 2) * (2 + 2)
= 4 * (2 + 2)
= 4 * 4
= 16

Here 2 + 2 is evaluated twice. Most Haskell implementations (optimizations aside) would create a thunk so that it would be computed at most once (which is called call-by-need). But a call-by-name Haskell implementation that evaluated it twice would be technically conforming. Because Haskell is pure, there will be no difference in the result computed. In the real world though, this strategy ends up being much too expensive to be practical.

As for the choice not to memoize functions the logic is the same. It is perfectly conforming for a compiler to aggressively memoize every function, using something like optimal evaluation. Haskell's purity means that there will be no difference in the result if this evaluation strategy were chosen. But again, in real-world applications, memoizing every function like this ends up taking a lot of memory, and the overhead of an optimal evaluator is too high to give good practical performance.

A Haskell compiler could also choose to memoize some functions but not others in order to maximize performance. This would be great--my understanding is that it is not really known how to do this reliably. It is very hard for a compiler to tell in advance which computations will be cheap and which will be expensive, and which computations will probably be reused and which will not.

So a balance is chosen, in which values, whose evaluated forms are usually smaller than their thunks, are memoized; whereas functions, whose memoized forms are usually larger than their definitions (since they need a whole memo table), are not. And then we get some techniques like those in the article to switch back and forth between these representations, according to our own judgment as programmers.

Memoization in Haskell

1 Answers1