Can't cons cells be implemented efficiently at the library level in Clojure?

Question

Clojure has its own collections, and has no need of the traditional lispy cons cells. But I find the concept interesting, and it is used in some teaching materials (e.g., SICP). I have been wondering if there is any reasons that this cons primitive needs to be a primitive. Can't we just implement it (and the traditional functions that operate on it) in a library? I searched, but I found no such library already written.

I don't think just having cons cells is enough (it's obviously trivial to implement a pair): you need the syntax of the language to be made of them. — , Apr 23 '18 at 17:09
Since Lisp uses many cons cells and often at a high *consing* rate, one usually wants to have them implemented as efficiently (space and time) as possible. This requires then a runtime implementation at a very low-level - typically in assembler or C. For example the cons cell should not use more than two machine words in memory. — Rainer Joswig, Apr 23 '18 at 17:24
@RainerJoswig: I think this is an answer in a way: they can't be implemented efficiently in the language because 'efficently' for conses has a very specific meaning. In a similar way floating-point arithmetic can't be implemented efficiently in software. — , Apr 24 '18 at 08:14

Rainer Joswig · Answer 1 · 2019-05-28T07:51:07.923

Cons cells are an important building block in Lisp for s-expressions. See for example the various publications by McCarthy about Lisp and Symbolic Expressions from 1958 onwards (for example Recursive Functions of Symbolic Expressions). Every list in Lisp is made of cons cells.

It's definitely possible to implement linked lists (and trees, ...) with cons cells as a library. But for Lisp they are so central, that it needs them early on and with a very efficient implementation.

In a Lisp system typically there are many cons cells and a high rate of allocating new cons cells (called consing). Thus the implementors of a Lisp may want to optimize their Lisp implementation for:

small size of cons cells -> not more than two machine words, one word for the car and one word for the cdr
fast allocation of new cons cells
efficient garbage collection of cons cells (find no-longer used cons cells very quickly)
storing primitive data (numbers, characters, ...) directly in cons cells -> no pointer overhead
optimize locality of Lisp data like cons cell structures (lists, assoc lists, trees, ...) for example by using a generational/copying garbage collector and/or memory regions for cons cells

Thus Lisp systems use all kinds of tricks to achieve that. For example pointers may encode if they point to a cons cell - thus the cons cell itself does not need a type tag. Fixnums have very few tag bits and fit into the CAR or CDR of a cons cell. On the MIT Lisp Machine the system also had the feature to omit the CDR part of a cons cell, when it was a part of a linear list.

To achieve all these optimization goals one usually needs a hand-tuned implementation of a Lisp runtime in assembler and/or C. A Lisp processor or a Lisp VM usually will provide CAR, CDR, CONS, CONSP, ... as machine instructions.

It's like TFB said: similarly one can implement floating point numbers in a library, but it will not be efficient compared to native floating point numbers and operations supported by a CPU. Lisp implementations provide cons cells at a very very low level.

But outside of such a Lisp implementation, its clearly possible to implement cons cells as a library - with somewhat worse space and time efficiency.

Side note

Maclisp had cons cells with more than two slots called Hunks

score 4 · Answer 2 · answered Apr 23 '18 at 15:12

You could implement it yourself. Here is an attempt:

(defprotocol cons-cell
  (car [this])
  (cdr [this]) 
  (rplaca [this v])
  (rplacd [this v]))

(deftype Cons [^:volatile-mutable car
               ^:volatile-mutable cdr]
  cons-cell
  (car [this] (.car this))
  (cdr [this] (.cdr this))
  (rplaca [this value] (set! car value))
  (rplacd [this value] (set! cdr value)))

(defn cons [car cdr]
  (Cons. car cdr))

Circular list:

(let [head (cons 0 nil)]
  (rplacd head head) 
  head)

amalloy · Accepted Answer · 2019-05-25T09:12:02.357

Of course, you can implement cons cells with no tools other than lambda (called fn in Clojure).

(defn cons' [a d]
  (fn [f] (f a d)))

(defn car' [c]
  (c (fn [a d] a)))

(defn cdr' [c]
  (c (fn [a d] d)))

user> (car' (cdr' (cons' 1 (cons' 2 nil))))
2

This is as space-efficient as you can get in Clojure (a lambda closing over two bindings is just an object with two fields). car and cdr could obviously be more time-efficient if you used a record or something instead; the point I'm making is that yes, of course you can make cons cells, even if you have next to no tools available.

Why isn't it done, though? We already have better tools available. Clojure's sequence abstraction makes a better list than cons cells do, and vectors are a perfectly fine tuple. There's just no great need for cons cells. Combine that with the fact that anyone who does want them will find it trivially easy to implement anew, and there are no customers for a prospective library solution.

I would expect that modelling cons cells as two item records or vector is more efficient, since it would not cons a new closure for each cons cell and accessing the slots would be just a record/structure access, instead of two functions with another closure invoked. — Rainer Joswig, May 24 '19 at 17:57

Can't cons cells be implemented efficiently at the library level in Clojure?

3 Answers3