Performance Impact of Creating Enclosed Procedure in Scheme During Recursion

Question

I'm making my way through the book The Little Schemer to start to learn to think in Lisp. As you get into it and really cover the use of lambdas, the 'remove' procedure is written in the following general form, which returns a remove procedure for arbitrary test test?:

(define rember-f
  (lambda (test?)
    (lambda (a l)
      (cond
        ((null? l) (quote ()))
        ((test? (car l) a) (cdr l))
        (else (cons (car l)
                ((rember-f test?) a (cdr l))))))))

I understand how this works just fine, but a plain reading of it suggests that at each recursive step, it is the procedure rember-f that is called again to generate a new enclosed procedure. This would mean when you call your returned procedure on a list, it calls rember-f to generate the same procedure again anew and then that new one is what is called for recursion (if that is not clear see my fix below). I understand that this may be optimized away, but in lieu of not knowing whether it is (and also in attempting to get my head around this syntax anyway), I managed after some experimentation to move the recursion to the procedure itself rather than the enclosing procedure as follows:

(define rember-f
  (lambda (test?)
    (define retfun
      (lambda (a l)
        (cond
          ((null? l) (quote ()))
          ((test? (car l) a) (cdr l))
          (else (cons (car l) (retfun a (cdr l)))))))
    retfun))

I have verified that this works as expected. The return value is a procedure that removes the first element of a list (arg 2) matching a value (arg 1). It looks to me like this one only calls rember-f once, which guarantees it only generates one enclosed procedure (this time with a name, retfun).

This is actually interesting to me because unlike the usual tail call optimization, which is about not consuming space on the call stack and so making recursion about as efficient as iteration, in this case the compiler would have to determine that (rember-f test?) is the enclosing procedure scope unmodified and so replace it with the same return value, which is the anonymous (lambda (a l) ...). It would not surprise me at all to learn that the interpreter / compiler does not catch this.

Yes, I know that scheme is a specification and there are many implementations, which get the various functional programming optimizations right to differing degrees. I am currently learning by experimenting in the guile REPL, but would be interested in how different implementations compare on this issue.

Does anyone know how Scheme is supposed to behave in this instance?

Mark Saving · Answer 1 · 2021-07-07T18:00:00.747

1

Both procedures have the same asymptotic time complexity. Let's consider the evaluation of ((rember-f =) 1 '(5 4 3 2 1 0)).

A partial evaluation proceeds as follows:

((rember-f =) 1 '(5 4 3 2 1 0))
((lambda (a l)
      (cond
        ((null? l) (quote ()))
        ((= (car l) a) (cdr l))
        (else (cons (car l)
                ((rember-f =) a (cdr l)))))) 1 '(5 4 3 2 1 0))
(cons 5 ((rember-f = 1 '(4 3 2 1 0))))

Note that the creation of the temporary lambda procedure takes O(1) time and space. So it doesn't actually add any substantial overhead to the cost of calling the function. At best, factoring out the function will lead to a constant-factor speedup and the use of a constant amount less of memory.

But how much memory does it really take to make a closure? It turns out it takes very little memory. A closure consists of a pointer to the environment and a pointer to compiled code. Basically, creating the closure requires as much time and space as making a cons cell. So even though it looks like we're using a lot of memory when I show the evaluation, very little memory and very little time is actually used to make and store the lambda.

So essentially, by factoring out the recursive function, you've allocated a single cons cell rather than writing code which allocates that cons cell one time per recursive call.

For more information on this, see Lambda is cheap, and Closures are Fast.

edited Jul 07 '21 at 18:00

answered Jul 07 '21 at 14:46

Mark Saving

1,752
7
11

each new closure created in O(1) space makes it for O(n) space for the nth closure with all those nested environments holding the same reference to `test?` (and `a` and `l`) completely needlessly -- the problem fully avoided by _naming_ the recursive function and *reusing* it. so it's pretty reckless, the original code. – Will Ness Jul 08 '21 at 07:45
@WillNess Even if it added `Theta(n)` space to create the extra closures, the algorithm is already `Theta(n)` so it wouldn't change the asymptotic space complexity of the algorithm, as I stated. Creating an environment with three references (one to `test?`, `a`, and `l`) is just a constant factor more than creating an environment with 1 reference (eg the reference to `l`, which is unavoidable), so that doesn't change the asymptotic complexity. Depending on the model, you might even get rid of references in the environment as soon as they're no longer needed which eliminates even this overhead. – Mark Saving Jul 08 '21 at 17:50
Thanks for this. I understand your point. So my optimization is correct, but if the compiler is implemented correctly it should only reduce the "big O factor" and not the actual complexity order. I read this reply when you posted it yesterday and the link you gave, and my takeaway was that it is still linear in space requirements, so is definitely a penalty, just not an order increasing penalty. It seems that is where this thread has landed. – Poisson Aerohead Jul 08 '21 at 20:49
@PoissonAerohead Correct – Mark Saving Jul 08 '21 at 20:50
@MarkSaving It looks like that link has a full archived book at it too, so I'll give it a look at some point. – Poisson Aerohead Jul 08 '21 at 20:56
@MarkSaving why "already linear"? it is O(1) _auxiliary_ space when implemented properly. I was in error re nested environments; still, creating _n_ new O(1) closures takes O(n) auxiliary space (but garbage collectable, yes), creating one closure takes O(1). – Will Ness Jul 08 '21 at 21:20
@WillNess this is how I understand it. Under the assumption that creating the closure is in fact O(1) in time (it is definitely O(1) in space), then the original algorithm creates n additional closures and so that adds O(n) in time and space. But the optimized algorithm is already O(n) in time and space (due to the call to cons in the recursion step, it should be linear and not constant in space too I think, but I may be wrong there). This means my observation and optimization are correct but do not lower the Big O order, only the factor. At any rate, I think the original looks wasteful. – Poisson Aerohead Jul 08 '21 at 21:48
@PoissonAerohead the space taken by the output is unavoidable, but I make the distinction between that and auxiliary space needed for the algorithm's operations. but you're right, the `cons` makes it so the stack space is O(n), since the recursive call must be evaluated first, and thus the stack must be maintained. (unless the imaginary implementation decides to go the extra mile and perform the [tag:tailrecursion-modulo-cons] optimization, which I know of none that does :)). – Will Ness Jul 09 '21 at 06:44

Mulan · Accepted Answer · 2021-07-08T15:07:44.020

You are right to be concerned about the additional repeated lambda abstractions. For example you wouldn't write this, would you?

(cond ((> (some-expensive-computation x) 0) ...)
      ((< (some-expensive-computation x) 0) ...)
      (else ...))

Instead we bind the result of some-expensive-computation to an identifier so we can check multiple conditions on the same value -

(let ((result (some-expensive-computation x)))
 (cond ((> result 0) ...)
       ((< result 0) ...)
       (else ...)))

You discovered the essential purpose of so-called "named let" expressions. Here's your program -

(define rember-f
  (lambda (test?)
    (define retfun
      (lambda (a l)
        (cond
          ((null? l) (quote ()))
          ((test? (car l) a) (cdr l))
          (else (cons (car l) (retfun a (cdr l)))))))
    retfun))

And its equivalent using a named-let expression. Below we bind the let body to loop, which is a callable procedure allowing recursion of the body. Notice how the lambda abstractions are used just once, and the inner lambda can be repeated without creating/evaluating additional lambdas -

(define rember-f
  (lambda (test?)
    (lambda (a l)
      (let loop ; name, "loop", or anything of your choice
       ((l l))  ; bindings, here we shadow l, or could rename it
       (cond
         ((null? l) (quote ()))
         ((test? (car l) a) (cdr l))
         (else (cons (car l) (loop (cdr l))))))))) ; apply "loop" with args

Let's run it -

((rember-f eq?) 'c '(a b c d e f))

'(a b d e f)

The syntax for named-let is -

(let proc-identifier ((arg-identifier initial-expr) ...)
  body ...)

Named-let is a syntax sugar of a letrec binding -

(define rember-f
  (lambda (test?)
    (lambda (a l)
      (letrec ((loop (lambda (l)
                       (cond
                         ((null? l) (quote ()))
                         ((test? (car l) a) (cdr l))
                         (else (cons (car l) (loop (cdr l))))))))
        (loop l)))))

((rember-f eq?) 'c '(a b c d e f))

'(a b d e f)

Similarly, you could imagine using a nested define -

(define rember-f
  (lambda (test?)
    (lambda (a l)
      (define (loop l)
        (cond
          ((null? l) (quote ()))
          ((test? (car l) a) (cdr l))
          (else (cons (car l) (loop (cdr l))))))
      (loop l))))

((rember-f eq?) 'c '(a b c d e f))

'(a b d e f)

PS, you you can write '() in place of (quote ())

named let is pretty mysterious, and explained well by the use of `letrec`. — Will Ness, Jul 08 '21 at 07:43
@WillNess thanks for nudging me to improve this answer. I was a little short on time yesterday. — Mulan, Jul 08 '21 at 15:08
Thanks this is good. I am aware of the apostrophe syntax, the original code was copied from the book and then I just took it and modified it. So it looks like either mine works or the named let works, and from the above talk, it will save both space and time but of the same order of complexity (so not an enormous impact). — Poisson Aerohead, Jul 08 '21 at 20:53

score 1 · Answer 3 · answered Jul 08 '21 at 04:18

1

to start to learn to think in Lisp

That book is not about thinking in lisp, but about recursive thinking, which is one of the ways of computation discovered in the 20th century by Goedel, Herbrand, Rozsa Peter.

Does anyone know how Scheme is supposed to behave in this instance?

After you finish the little lisper you should take the SICP, which will make you understand what kind of decisions an implementation of a language can make. You mean, how different implementations act. To understand their implementation decision, the best step to do is to learn it from SICP. Take care, unless you are already a certified computer science graduate, this texbook will take you a few years to master, if you study it each day. If you are already a graduate, it will take you only about 1 year to master.

answered Jul 08 '21 at 04:18

alinsoar

15,386
4
57
74

"SICP" is too general an advice. "SICP which describes the environment model" is specific and concrete, and directly relates to the question. – Will Ness Jul 08 '21 at 07:50
@WillNess Not only. The last paragraphs of the book explain how to implement these concepts in C/assembler. Depends the lever of detail you want to understand, etc. – alinsoar Jul 08 '21 at 12:45
i agree with this but this post is more of a comment than an answer to the question. i'm guilty of making this mistake too :D – Mulan Jul 08 '21 at 15:20
Yes, I agree that was a little bit of an abuse of the language on my part. I guess I should say, I knew zero lisp and the structure of that book gently introduces you to the syntax and thinking recursively. By the end you are ready to try a real book on computability theory! But additionally, you get used to the syntax, which is very different than what today would be called "normal" languages. I can tell that the book is not acceptable to use scheme or lisp for a practical application though, yes. – Poisson Aerohead Jul 08 '21 at 20:44
@PoissonAerohead to practice it in "practical applications" you can play with Emacs, Autocad, Gimp, Sawfish and many others. – alinsoar Jul 09 '21 at 14:41

Performance Impact of Creating Enclosed Procedure in Scheme During Recursion

3 Answers3