Why is Racket implementation so much faster than MIT Scheme?

Question

The following code uses the Euclidean algorithm to calculate gcd(a,b) and integers s, t such that sa+tb=gcd(a,b) (for a Discrete Mathematics course). I coded it in C, and perhaps this will clearly illustrate the algorithm.

gcd.c:

#include <stdio.h>

int gcd_st(int m, int n, int *s, int *t) {
  int a, b, res, tmp;
  a = m>n?m:n;
  b = m>n?n:m;
  if(!b) {
    *s = 1;
    *t = 0;
    return a;
  }
  res = gcd_st(b, a%b, s, t);
  tmp = *t;
  *t = *s - *t*(a/b);
  *s = tmp;
  return res;
}

int main() {
  int st[2];
  for(int i=0; i<100000000; i++)
    gcd_st(42, 56, st, st+1);
  for(int i=0; i<100000000; i++)
    gcd_st(273, 110, st, st+1);

  int res = gcd_st(42, 56, st, st+1);
  printf("%d %d %d\n", res, st[0], st[1]);

  res = gcd_st(273, 110, st, st+1);
  printf("%d %d %d\n", res, st[0], st[1]);
}

Just for fun, I decided to code it in Scheme (Lisp) as well. At first, I tested it on MIT Scheme's implementation, and then using Racket's implementation.

gcd.scm (without first two lines); gcd.rkt (including first two lines):

#!/usr/bin/racket
#lang racket/base

(define (gcd_st m n)
  (let ((a (max m n)) (b (min m n)))
    (if (= b 0) (list a 1 0)
      (let ((res (gcd_st b (remainder a b))))
        (let ((val (list-ref res 0))
          (s (list-ref res 1))
          (t (list-ref res 2)))
            (list val t (- s (* t (quotient a b)))))))))

(define (loop n fn)
  (if (= n 0) 0
      (loop (- n 1) fn)))

(loop 100000000 (lambda () (gcd_st 42 56)))
(loop 100000000 (lambda () (gcd_st 273 110)))

(display "a b: (gcd s t)\n42 56: ")
(display (gcd_st 42 56))
(display "\n273 110: ")
(display (gcd_st 273 110))
(display "\n")

Both programs run 10^8 iterations on two sample cases and produce the same output. However, the two Scheme implementations (which share the same code/algorithm) differ greatly in performance. The Racket implementation is also a great deal quicker than the C implementation, which in turn is much faster than the MIT-Scheme implementation.

The time difference is so drastic I thought maybe Racket was optimizing out the entire loop, since the result is never used, but the time still does seem to scale linearly with loop iterations. Is it possible that it is doing some introspection and optimizing out some of the code in the loop?

$ time ./gcd.rkt  # Racket
0
0
a b: (gcd s t)
42 56: (14 1 -1)
273 110: (1 27 -67)

real  0m0.590s
user  0m0.565s
sys 0m0.023s

$ time scheme --quiet <gcd.scm  # MIT-Scheme
a b: (gcd s t)
42 56: (14 1 -1)
273 110: (1 27 -67)

real  0m59.250s
user  0m58.886s
sys 0m0.129s

$ time ./gcd.out  # C 
14 1 -1
1 27 -67

real  0m7.987s
user  0m7.967s
sys 0m0.000s

Why is the Racket implementation so much quicker?

=====

Update: If anyone's wondering, here are the results using the corrected loop function taking the answer into account:

loop:

(define (loop n fn)
    (fn)
    (if (= n 1) 0
        (loop (- n 1) fn)))

Racket (still slightly outperforms C, even including its setup time):

real    0m7.544s
user    0m7.472s
sys 0m0.050s

MIT Scheme

real    9m59.392s
user    9m57.568s
sys 0m0.113s

The question still holds about the large difference between the Scheme implementations (still large), however. I'll ask this separately to ignore confusion with the previous error.

shouldn't the tag [tag:C] be also included? you also didn't include the compilation command for C. — Will Ness, Feb 02 '20 at 05:43
You may want Racket's `time` form to get timing for just running the function, without startup time etc: https://docs.racket-lang.org/reference/time.html#(form._((lib._racket%2Fprivate%2Fmore-scheme..rkt)._time)) — LiberalArtist, Feb 02 '20 at 07:20
The Racket/Scheme is also not especially optimized. For one idea, consider `quotient/remainder` and multiple values: https://docs.racket-lang.org/reference/generic-numbers.html#(def._((quote._~23~25kernel)._quotient%2Fremainder)) — LiberalArtist, Feb 02 '20 at 07:24
LiberalArtist, The `quotient` and `remainder` aren't even being called in the loop though. The problem here is it's not being called. See my answer for details. Also, while it's true that Racket is not especially optimized, keep in mind in this post, it's out performing MIT Scheme, not slower. So Racket not being well optimized can't be the answer. I'm actually still wondering why MIT Scheme is so much slower than Racket when simply counting down from 100 million in a loop. — mmachenry, Feb 02 '20 at 08:00
Thanks @LiberalArtist for the suggestions. I guess im wondering what mmachenry says, as this should still be true in the corrected code. But that’s for a separate question. — Jonathan Lam, Feb 02 '20 at 14:12
@WillNess It was standard GCC compilation (no extra flags), but now that the Racket/C performance difference has been resolved, I guess the point is moot now. — Jonathan Lam, Feb 02 '20 at 14:51

score 7 · Accepted Answer · answered Feb 02 '20 at 06:28

You are not actually invoking your thunk that calls the computation within your implementation of loop. This is why it's so much faster than the C implementation. You're not actually computing anything.

I'm not sure why exactly MIT Scheme is so slow for this. Just counting down from 100 million seems like it should be lightning fast like it is in Racket.

To actually compute the gcd redundantly, throw away the result, and measure the time, implement loop like this:

(define (loop n fn)
  (if (= n 0) 0
      (begin
        (fn)
        (loop (- n 1) fn))))

Oh my. That’s what sleep deprivation gets. Could’ve sworn I had that in their earlier, but I must’ve gotten too carried away with all of the other tidbits and blamed the rest on scheme implementations. Thank you. — Jonathan Lam, Feb 02 '20 at 14:08

Why is Racket implementation so much faster than MIT Scheme?

1 Answers1