19

This is from the SICP book that I am sure many of you are familiar with. This is an early example in the book, but I feel an extremely important concept that I am just not able to get my head around yet. Here it is:

(define (cons x y)
 (define (dispatch m)
   (cond ((= m 0) x)
         ((= m 1) y)
         (else (error "Argument not 0 or 1 - CONS" m))))
 dispatch)
(define (car z) (z 0))
(define (cdr z) (z 1))

So here I understand that car and cdr are being defined within the scope of cons, and I get that they map some argument z to 1 and 0 respectively (argument z being some cons). But say I call (cons 3 4)...how are the arguments 3 and 4 evaluated, when we immediately go into this inner-procedure dispatch which takes some argument m that we have not specified yet? And, maybe more importantly, what is the point of returning 'dispatch? I don't really get that part at all. Any help is appreciated, thanks!

Óscar López
  • 232,561
  • 37
  • 312
  • 386
Houdini
  • 3,442
  • 9
  • 31
  • 47

3 Answers3

23

This is one of the weirder (and possibly one of the more wonderful) examples of exploiting first-class functions in Scheme. Something similar is also in the Little Schemer, which is where I first saw it, and I remember scratching my head for days over it. Let me see if I can explain it in a way that makes sense, but I apologize if it's not clear.

I assume you understand the primitives cons, car, and cdr as they are implemented in Scheme already, but just to remind you: cons constructs a pair, car selects the first component of the pair and returns it, and cdr selects the second component and returns it. Here's a simple example of using these functions:

> (cons 1 2)
(1 . 2)
> (car (cons 1 2))
1
> (cdr (cons 1 2))
2

The version of cons, car, and cdr that you've pasted should behave exactly the same way. I'll try to show you how.

First of all, car and cdr are not defined within the scope of cons. In your snippet of code, all three (cons, car, and cdr) are defined at the top-level. The function dispatch is the only one that is defined inside cons.

The function cons takes two arguments and returns a function of one argument. What's important about this is that those two arguments are visible to the inner function dispatch, which is what is being returned. I'll get to that in a moment.

As I said in my reminder, cons constructs a pair. This version of cons should do the same thing, but instead it's returning a function! That's ok, we don't really care how the pair is implemented or laid out in memory, so long as we can get at the first and second components.

So with this new function-based pair, we need to be able to call car and pass the pair as an argument, and get the first component. In the definition of car, this argument is called z. If you were to execute the same REPL session I had above with these new cons ,car, and cdr functions, the argument z in car will be bound to the function-based pair, which is what cons returns, which is dispatch. It's confusing, but just think it through carefully and you'll see.

Based on the implementation of car, it appears to be that it take a function of one argument, and applies it to the number 0. So it's applying dispatch to 0, and as you can see from the definition of dispatch, that's what we want. The cond inside there compares m with 0 and 1 and returns either x or y. In this case, it returns x, which is the first argument to cons, in other words the first component of the pair! So car selects the first component, just as the normal primitive does in Scheme.

If you follow this same logic for cdr, you'll see that it behaves almost the same way, but returns the second argument to cons, y, which is the second component of the pair.

There are a couple of things that might help you understand this better. One is to go back to the description of the substitution model of evaluation in Chapter 1. If you carefully and meticulously follow that substitution model for some very simple example of using these functions, you'll see that they work.

Another way, which is less tedious, is to try playing with the dispatch function directly at the REPL. Below, the variable p is defined to refer to the dispatch function returned by cons.

> (define p (cons 1 2))
#<function> ;; what the REPL prints here will be implementation specific
> (p 0)
1
> (p 1)
2
okonomichiyaki
  • 8,355
  • 39
  • 51
  • Ok, great answer, and I will definitely look back over that part of the book. But what about `m`? I never see anything passed to dispatch directly. It appears to me that `m` would be the procedure `car` or `cdr`. Is that true? Say for example we evaluate `(car (cons 1 2))`. This will first evaluate the _arguments_ of `car`, so it will look at `(cons 1 2)`. This will drop into the definition of `dispatch`....which is looking for `m`. So how does dispatch know that `m` is the `car` or `cdr` of `cons`? Let me know if that question doesn't make sense, I will try to reword it :) – Houdini Sep 19 '12 at 17:08
  • 2
    No, `m` is not a procedure, it will be either `0` or `1`. When we evaluate `(car (cons 1 2))` and drop into the body of `cons`, the inner function `dispatch` is only defined and then returned, and not called yet. So at that point we are not concerned with `m`. Then when `cons` returns, we will need to evaluate something like `(car #)`. In the body of `car`, the function `dispatch` is bound to `z`, and is called with the argument `0`. So `dispatch` never knows about `car` or `cdr`, it only compares its argument `m` to `0` and `1`, so it's expecting a number. – okonomichiyaki Sep 19 '12 at 17:58
  • 3
    Ohhh...nice I think I get it. Its like a bulb went on in my head! Wow....so in `(car (cons x y))`, `car` evaluates its argument (`cons`), which itself returns a procedure called `dispatch`. _Then_ `car` is applied to that procedure. And since `car` is defined to apply its argument to 0, _and_ since `dispatch` is defined within the scope of `cons`, it is able to resolve to either the `x` value of `cons`. Sound about right? I sure hope so...haha. Still going to be reading back over that chapter though! – Houdini Sep 19 '12 at 18:12
  • 3
    Yep, it sounds like you've figured it out! SICP is such a wonderful book, I'm glad to help a fellow programmer enjoy having their mind blown a little bit. – okonomichiyaki Sep 19 '12 at 18:14
  • 1
    You helped me out a ton, and hopefully other programmers too. Thanks again! – Houdini Sep 19 '12 at 18:26
  • @Houdini "`m`" is for "message", BTW. – Will Ness Aug 03 '18 at 15:59
  • so what are the applications of such a representation? – X10D Apr 27 '20 at 19:23
  • @X10D Intellectual enlightenment. Understanding the fact that you can represent data as functions and functions as data. – Óscar López Apr 27 '20 at 19:34
  • Seems Church showed that with: https://en.wikipedia.org/wiki/Church_encoding – X10D Apr 27 '20 at 19:52
  • Functions as data as in passing around a quoted expression so and evaluating later? Data as functions that generate it? – X10D Apr 27 '20 at 19:53
7

The code in the question shows how to redefine the primitive procedure cons that creates a cons-cell (a pair of two elements: the car and the cdr), using only closures and message-dispatching.

The dispatch procedure acts as a selector for the arguments passed to cons: x and y. If the message 0 is received, then the first argument of cons is returned (the car of the cell). Likewise, if 1 is received, then the second argument of cons is returned (the cdr of the cell). Both arguments are stored inside the closure defined implicitly for the dispatch procedure, a closure that captures x and y and is returned as the product of invoking this procedural implementation of cons.

The next redefinitions of car and cdr build on this: car is implemented as a procedure that passes 0 to a closure as returned in the above definition, and cdr is implemented as a procedure that passes 1 to the closure, in each case ultimately returning the original value that was passed as x and y respectively.

The really nice part of this example is that it shows that the cons-cell, the most basic unit of data in a Lisp system can be defined as a procedure, therefore blurring the distinction between data and procedure.

Óscar López
  • 232,561
  • 37
  • 312
  • 386
  • 1
    Thanks! The other answer was a bit more descriptive and gave me a reference from the book, but your answer definitely is helping me to understand this concept as well. Especially the idea that _data_ can actually be represented as a _procedure_. Honestly I didn't understand what that even meant until I looked at this code from the book....whereas we would normally represent something like `cons` as a data structure or variable, with this idea we can actually represent `cons` in terms of a procedure. Free from any particular data structure. Pretty cool, albeit confusing. – Houdini Sep 19 '12 at 17:13
  • so what are the applications of such a representation? – X10D Apr 27 '20 at 19:23
6

This is the "closure/object isomorphism", basically.

The outer function (cons) is a class constructor. It returns an object, which is a function of one argument, where the argument is equivalent to the name of a method. In this case, the methods are getters, so they evaluate to values. You could just as easily have stored more procedures in the object returned by the constructor.

In this case, numbers where chosen as method names and sugary procedures defined outside the object itself. You could have used symbols:

(define (cons x y)
  (lambda (method)
    (cond ((eq? method 'car) x)
          ((eq? method 'cdr) y)
          (else (error "unknown method")))))

In which case what you have more closely resembles OO:

# (define p (cons 1 2))
# (p 'car)
1
# (p 'cdr)
2
nni
  • 61
  • 1
  • 3