When compiling a functional program to machine language, the compiler has to choose how to implement closures. In the following example (Scheme syntax), the function f
returns the procedure (lambda (y) (+ x y))
whose machine representation (its closure) has to comprise a code pointer together with a value of x
.
(lambda (f x)
(lambda (y) (+ x y))
Closure conversion is the process of choosing a layout for such a machine representation. Two typical strategies are either linked lists or flat closures. While flat closures may involve extra copying of values, naive linked lists are generally not safe-for-space (variables are kept alive after their last uses).
An algorithm that optimizes the flat-closure strategy is described by Keep, Hearn, and Dybvig in their paper Optimizing Closures in O(0) time. This algorithm gives good results when the compilation strategy is stack-based. For CPS-based compilers where the activation records are allocated on the heap (see Compiling with Continuations by Appel), the algorithm by Keep et al. does not seem to give optimal results.
On the other hand, Appel and Shao present in their paper Efficient and Safe-for-Space Closure Conversion a very sophisticated algorithm for closure conversion where closures are implemented as something between flat closures and pure linked lists, which is safe-for-space and which is adapted to a compiler employing CPS and passing continuations around explicitly. The examples in this paper show that the algorithm gives very good results.
Given that the paper by Appel and Shao is about 20 years old, I am wondering whether it is still state of the art when performing closure conversion for functional languages in a CPS.
In their paper, they mention that the algorithm has been successfully implemented in a version of SML/NJ (1.03z); I haven't been successful to find an implementation of the algorithm in current versions of SML/NJ, though.
Was or is the algorithm employed in a practical compiler? Or are there refinements or other algorithms known for closure conversion that are both safe-for-space and efficient when activation records are allocated on the heap (which, in turn, allows for an O(1) implementation of call/cc
)? Have these algorithms ever be implemented or are their reasons speaking against their efficiency?
To allow experimentations with various strategies, it would be nice if answers included (references to) (open source) implementations of the algorithms or, at least, to existing compilers. This may also give a hint how these algorithms prove in the praxis.