16

While writing up examples for memoization and continuation passing style (CPS) functions in a functional language, I ended up using the Fibonacci example for both. However, Fibonacci doesn't really benefit from CPS, as the loop still has to run exponentially often, while with memoization its only O(n) the first time and O(1) every following time.

Combining both CPS and memoization has a slight benefit for Fibonacci, but are there examples around where CPS is the best way that prevents you from running out of stack and improves performance and where memoization is not a solution?

Or: is there any guideline for when to choose one over the other or both?

Abel
  • 56,041
  • 24
  • 146
  • 247
  • 3
    Can you provide a problem for which CPS and memoization are alternative solutions? – Daniel Feb 08 '13 at 22:51
  • 3
    You may find the following article interesting: [Solving the 0-1 knapsack problem using continuation-passing style with memoization in F#](http://zbray.com/2011/11/02/solving-the-0-1-knapsack-problem-using-continuation-passing-style-with-memoization-in-f/) – Phillip Trelford Feb 09 '13 at 13:14

2 Answers2

47

On CPS

While CPS is useful as an intermediate language in a compiler, on the source language level it is mostly a device to (1) encode sophisticated control flow (not really performance-related) and (2) transform a non-tail-call consuming stack space into a continuation-allocating tail-call consuming heap space. For example when you write (code untested)

let rec fib = function
  | 0 | 1 -> 1
  | n -> fib (n-1) + fib (n-2)

let rec fib_cps n k = match n with
  | 0 | 1 -> k 1
  | n -> fib_cps (n-1) (fun a -> fib_cps (n-2) (fun b -> k (a+b)))

The previous non-tail-call fib (n-2), which allocated a new stack frame, is turned into the tail-call fib (n-2) (fun b -> k (a+b)) which allocates the closure fun b -> k (a+b) (on the heap) to pass it as argument.

This does not asymptotically reduce the memory usage of your program (some further domain-specific optimizations might, but that's another story). You're just trading stack space for heap space, which is interesting on systems where stack space is severely limited by the OS (not the case with some implementations of ML such as SML/NJ, which track their call stack on the heap instead of using the system stack), and potentially performance-degrading because of the additional GC costs and potential locality decrease.

CPS transformation is unlikely to improve performances much (though details of your implementation and runtime systems might make it so), and is a generally applicable transformation that allows to avoid the snark "Stack Overflow" of recursive functions with a deep call stack.

On Memoization

Memoization is useful to introduce sharing of subcalls of recursive functions. A recursive function typically solves a "problem" ("compute fibonacci of n", etc.) by decomposing it into several strictly simpler "sub-problems" (the recursive subcalls), with some base cases for which the problem is solvable right away.

For any recursive function (or recursive formulation of a problem), you can observe the structure of the subproblem space. Which simpler instances of Fib(k) will Fib(n) need to return its result? Which simpler instances will those instances in turn need?

In the general case, this space of subproblem is a graph (generally acyclic for termination purposes): there are some nodes that have several parents, that is several distinct problems for which they are subproblems. For example, Fib(n-2) is a subproblem both of Fib(n) and Fib(n-2). The amount of node sharing in this graph depends on the particular problem / recursive functions. In the case of Fibonacci, all nodes are shared between two parents, so there is a lot of sharing.

A direct recursive call without memoization will not be able observe this sharing. The structure of the calls of a recursive function is a tree, not a graph, and shared subproblems such as Fib(n-2) will be fully visited several times (as many as there are paths from the starting node to the subproblem node in the graph). Memoization introduces sharing by letting some subcalls return directly with "we've already computed this node and here is the result". For problems that have a lot of sharing, this can result in dramatic reduction of (useless) computation: Fibonacci moves from exponential to linear complexity when memoization is introduced -- note that there are other ways to write the function, without using memoization but a different subcalls structure, to have a linear complexity.

let rec fib_pair = function
  | 0 -> (1,1)
  | n -> let (u,v) = fib_pair (n-1) in (v,u+v)

The technique of using some form of sharing (usually through large tables storing the results) to avoid useless duplication of subcomputations is well-known in the algorithmic community, it is called Dynamic Programming. When you recognize that a problem is amenable to this treatment (you notice the sharing among the subproblems), this can provide large performance benefits.

Does a comparison make sense?

The two seems mostly independent of each other.

There are a lot of problems where memoization is not applicable, because the subproblem graph structure does not have any sharing (it is a tree). On the contrary, CPS transformation is applicable for all recursive functions, but does not by itself lead to performance benefits (other than potential constant factors due to the particular implementation and runtime system you're using, though they're likely to make the code slower rather than faster).

Improving performances by inspecting non-tail contexts

There are optimizations technique that is related to CPS that can improve performance of recursive functions. They consist in looking at the computations "left to be done" after the recursive calls (what would be turned into a function in straightforward CPS style) and finding an alternate, more efficient representation for it, that does not result in systematic closure allocation. Consider for example:

let rec length = function
  | [] -> 0
  | _::t -> 1 + length t

let rec length_cps li k = match li with
  | [] -> k 0
  | _::t -> length_cps t (fun a -> k (a + 1))

You can notice that the context of the non-recursive call, namely [_ + 1], has a simple structure: it adds an integer. Instead of representing this as a function fun a -> k (a+1), you may just store the integer to be added corresponding to several application of this function, making k an integer instead of a function.

let rec length_acc li k = match li with
  | [] -> k + 0
  | _::t -> length_acc t (k + 1)

This function runs in constant, rather than linear, space. By turning the representation of the tail contexts from functions to integers, we have eliminated the allocation step that made memory usage linear.

Close inspection of the order in which the additions are performed will reveal that they are now performed in a different direction: we are adding the 1's corresponding to the beginning of the list first, while the cps version was adding them last. This order-reversal is valid because _ + 1 is an associative operation (if you have several nested contexts, foo + 1 + 1 + 1, it is valid to start compute them either from the inside, ((foo+1)+1)+1, or the outside, foo+(1+(1+1))). The optimization above can be used for all such "associative" contexts around a non-tail-call.

There are certainly other optimizations available based on the same idea (I'm no expert on such optimizations), which is to look at the structure of the continuations involved and represent them under a more efficient form than functions allocated on the heap.

This is related to the transformation of "defunctionalization" that changes the representation of continuations from functions to data-structures, without changing the memory consumption (a defunctionalized program will allocate a data node when a closure would have been allocated in the original program), but allows to express the tail-recursive CPS version in a first-order language (without first-class functions) and can be more efficient on systems where data structures and pattern-matching is more efficient than closure allocation and indirect calls.

type length_cont =
  | Linit
  | Lcons of length_cont

let rec length_cps_defun li k = match li with
  | [] -> length_cont_eval 0 k
  | _::t -> length_cps_defun t (Lcons k)
and length_cont_eval acc = function
  | Linit -> acc
  | Lcons k -> length_cont_eval (acc+1) k

let length li = length_cps_defun li Linit

type fib_cont =
  | Finit
  | Fminus1 of int * fib_cont
  | Fminus2 of fib_cont * int

let rec fib_cps_defun n k = match n with
  | 0 | 1 -> fib_cont_eval 1 k
  | n -> fib_cps_defun (n-1) (Fminus1 (n, k))
and fib_cont_eval acc = function
  | Finit -> acc
  | Fminus1 (n, k) -> fib_cps_defun (n-2) (Fminus2 (k, acc))
  | Fminus2 (k, acc') -> fib_cont_eval (acc+acc') k

let fib n = fib_cps_defun n Finit
gasche
  • 31,259
  • 3
  • 78
  • 100
  • I didn't respond earlier (travelling), but now I really had time to go over your extensive answer. Excellent and well-spotted. The explanation on the graph vs tree and the revisiting of nodes is an excellent way of seeing the different use-cases here. Thanks! – Abel Feb 27 '13 at 16:09
2

One benefit of CPS is error handling. If you need to fail you just call your failure method.

I think the biggest situation is when you are not talking about calculations, where memoization is great. If you are instead talking about IO or other operations, the benefits of CPS are there but memoization doesn't work.

As to an instance where CPS and memoization are both applicable and CPS is better, I am not sure since I consider them different pieces of functionality.

Finally CPS is a bit diminished in F# since tail recursion makes the whole stack overflow thing a non-issue already.

Guvante
  • 18,775
  • 1
  • 33
  • 64
  • 4
    F# doesn't automatically turn every call into a tail-call, so CPS is still necessary to *correctly* implement certain functional design patterns. – Jack P. Feb 08 '13 at 22:28
  • 1
    In the fibonacci example, the recursion has two recursive return points and as a result cannot be TCO'd. Tree traverse algorithms neither. And a factorial implementation without accumulator cannot be TCO'd because the result of the function still needs to be used in the calculation before the function can return. In fact, TCO only applies to a limited set of recursion problems and then CPS comes to the rescue. – Abel Feb 08 '13 at 22:51
  • @Jack P. Doesn't F# automatically optimise every *tail call* into a loop? I agree that it is not easy to convert any general recursive function into a tail recursive one, but when given a tail recursive function, is it incorrect to assume that F# will optimise away the recursion (provided the appropriate compiler settings are turned on)? – Shredderroy Feb 09 '13 at 00:54
  • @Shredderroy : Not always, no – sometimes it emits a `call` op preceeded by a `tail` op, at which point it's up to the JIT compiler to optimize away the recursion. – ildjarn Feb 09 '13 at 01:29
  • 1
    @Shredderroy Tail calls aren't necessarily optimized into a loop; they simply allow the JIT compiler to discard the current stack frame when making the tail-call to the next function. Since there's no possibility of a stack overflow, that means a function which tail-calls itself (i.e., self-recursion) effectively becomes a loop. The F# compiler can *sometimes* take this a step further and optimize certain self-recursive functions into an explicit loop -- that is, the `tail.call` op is optimized away, instead just using a `br` op back to the beginning of the method body. – Jack P. Feb 09 '13 at 14:21
  • @JackP.: I guess I just don't think of an accumulator as being CPS. Many tail recursions can be simplified down to that, compared to CPS where you need to wrap up your final action. – Guvante Feb 09 '13 at 22:05