Why are Python and Ruby so slow, while Lisp implementations are fast?

Question

I find that simple things like function calls and loops, and even just loops incrementing a counter take far more time in Python and Ruby than in Chicken Scheme, Racket, or SBCL.

Why is this so? I often hear people say that slowness is a price you pay for dynamic languages, but Lisps are very dynamic and are not ridiculously slow (they are usually less than 5 times slower than C; Ruby and Python can go into the double digits). Besides, Lisp style uses recursion, and not always tail recursion, a lot, the stack is a linked list of continuations in the heap, etc, which seem to be things that should make Lisp slower than the imperative-style Python and Ruby.

Racket and SBCL are JITted, but Chicken Scheme is either statically compiled, or uses a non-optimizing interpreter, both of which should be badly suited to dynamic languages and slow. Yet even using the naive csi interpreter for Chicken Scheme (which doesn't even do bytecode compilation!), I get speeds far beyond Python and Ruby.

Why exactly are Python and Ruby so ridiculously slow compared to the similarly dynamic Lisps? Is it because they are object oriented and need huge vtables and type heirarchies?

Example: factorial function. Python:

def factorial(n):
    if n == 0:
        return 1
    else:
    return n*factorial(n-1)

for x in xrange(10000000):
    i = factorial(10)

Racket:

#lang racket

(define (factorial n)
  (cond
   [(zero? n) 1]
   [else (* n (factorial (sub1 n)))]))

(define q 0)

(for ([i 10000000])
  (set! q (factorial 10)))

Timing results:

ithisa@miyasa /scratch> time racket factorial.rkt
racket factorial.rkt  1.00s user 0.03s system 99% cpu 1.032 total
ithisa@miyasa /scratch> time python factorial.py
python factorial.py  13.66s user 0.01s system 100% cpu 13.653 total

SBCL has a REPL interpreter...me the common lisp n00b always used it as an interpreter... — ithisa, Nov 09 '13 at 19:15
@user54609: SBCL uses a compiler. Always and everywhere. In the REPL also. SBCL does not use an interpreter, by default. Every expression you enter at the REPL gets compiled before it runs. — Rainer Joswig, Nov 09 '13 at 19:18
@user54609: no. The REPL just calls EVAL, which calls the native code compiler. The code is fully compiled from source code to native code BEFORE runtime. It's just an *incremental compiler*, which can compile individual expressions. There is no byte code compilation, no byte code to native code JIT compilation, no runtime analysis, no code cache, ... It is an incremental native code compiler. — Rainer Joswig, Nov 09 '13 at 21:10
Because people have spent 55 years making Lisp fast, but only 20.5 years making Ruby fast. And because people have spent millions of dollars making Lisp fast. — Jörg W Mittag, Nov 10 '13 at 03:10
Lisps are not that dynamic, in comparison with the duck-typed OO languages. — SK-logic, Nov 10 '13 at 14:58
@user54609, Lisp do not depend on dynamic method dispatch. It has some polymorphic functions (notoriously, numeric tower), but the majority of call targets are decidable in compile time, unlike languages like Python, where each call have to be decided in runtime. — SK-logic, Nov 11 '13 at 11:02
The benchmark doesn't do justice to a skilled python user's output. They would use pypy, numba, or cython if the code wasn't fast enough. I got 18 seconds for that code so looked at cython within my jupyter notebook. A naive version sped it up around 4 times to 5 seconds. A more sophisticated application reduced it to 0.33 seconds, faster than lisp and often approaching C. https://nbviewer.jupyter.org/github/john9631/Jupyter-Notebooks/blob/master/CythonLispComparisons.ipynb — John 9631, May 08 '17 at 03:41

Rainer Joswig · Accepted Answer · 2020-02-05T20:51:35.043

Natively compiled Lisp systems are usually quite a bit faster than non-natively compiled Lisp, Ruby or Python implementations.

Definitions:

natively compiled -> compiles to machine code
compiled -> compiles to machine code or some other target (like byte code, JVM instructions, C code, ...)
interpreted Lisp -> runs s-expressions directly without compilation
interpreted Python -> runs compiled Python in a byte-code interpreter. The default Python implementation is not really interpreted, but using a compiler to a byte code instruction set. The byte code gets interpreted. Typically byte code interpreters are slower than execution of native code.

But keep in mind the following:

SBCL uses a native code compiler. It does not use a byte code machine or something like a JIT compiler from byte code to native code. SBCL compiles all code from source code to native code, before runtime. The compiler is incremental and can compile individual expressions. Thus it is used also by the EVAL function and from the Read-Eval-Print-Loop.
SBCL uses an optimizing compiler which makes use of type declarations and type inference. The compiler generates native code.
Common Lisp allows various optimizations which make the code less dynamic or not dynamic (inlining, early binding, no type checks, code specialized for declared types, tail-call optimizations, ...). Code which makes use of these advanced features can look complicated - especially when the compiler needs to be told about these things.
Without these optimizations compiled Lisp code is still faster than interpreted code, but slower than optimized compiled code.
Common Lisp provides CLOS, the Common Lisp Object System. CLOS code usually is slower than non-CLOS - where this comparison makes sense. A dynamic functional language tends to be faster than a dynamic object-oriented language.
If a language implementation uses a highly optimized runtime, for example for bignum arithmetic operations, a slow language implementation can be faster than an optimizing compiler. Some languages have many complex primitives implemented in C. Those tend to be fast, while the rest of the language can be very slow.
there can also be implementations of Python, which generate and run machine code, like the JIT compiler from PyPy. Ruby also now has a JIT compiler since Ruby 2.6.

Also some operations may look similar, but could be different. Is a for loop iterating over an integer variable really the same as a for loop which iterates over a range?

Compiled Lisp is faster than interpreted Python. Similarly, compiled Python (Cython) is faster than interpreted CPython. As noted in the first post, using Cython the fib code required 0.33 seconds instead of 18 seconds for 10,000,000 iterations (on my machine). — John 9631, May 08 '17 at 05:10
@John9631: 'interpreted Python' is actually compiled: https://docs.python.org/devguide/compiler.html — Rainer Joswig, May 08 '17 at 05:27

score 14 · Answer 2 · answered Nov 09 '13 at 21:13

14

Method dispatch in Ruby/Python/etc is expensive, and Ruby/Python/etc programs compute primarily by calling methods. Even for loops in Ruby are just syntactic sugar for a method call to each.

answered Nov 09 '13 at 21:13

Alex D

29,755
7
80
126

score 3 · Answer 3 · 2013-11-09T20:53:28.123

3

I don't know about your racket installation, but the Racket I just apt-get install'd uses JIT compilation if run without flags. Running with --no-jit gives a time much closer to the Python time (racket: 3s, racket --no-jit: 37s, python: 74s). Also, assignment at module scope is slower than local assignment in Python for language design reasons (very liberal module system), moving the code into a function puts Python at 60s. The remaining gap can probably be explained as some combination of coincidence, different optimization focus (function calls have to be crazy fast in Lisp, Python people care less), quality of implementation (ref-counting versus proper GC, stack VM versus register VM), etc. rather than a fundamental consequence of respective the language designs.

edited Nov 09 '13 at 20:53

answered Nov 09 '13 at 16:16

Hmm. Why does the JIT makes such a large difference? In most of my programs `--no-jit` does not have a difference; though admittedly the majority are I/O bound. – ithisa Nov 09 '13 at 19:16
1

"value comprehensive frame objects" is bogus here, since the code doesn't have tail calls. – Eli Barzilay Nov 09 '13 at 20:06
@EliBarzilay I don't know how extensive call stack manipulation is available in Racket, but Python frame objects [contain an awful lot of stuff](http://docs.python.org/2/library/inspect.html#types-and-members) and are maintained all the time while code is running. It's one reason function calls are relatively slow. – Nov 09 '13 at 20:15
@user54609 I don't know, as I know neither your programs nor the internals of Racket's JIT. I for one wonder why the JIT *doesn't* make a large difference in *your* programs: A well-implemented JIT compiler *should* improve performance significantly for most code. – Nov 09 '13 at 20:17
1

@delnan: Racket stack frames contain stuff too, even more, you can put your own [user defined information](http://docs.racket-lang.org/reference/contmarks.html) on it; in any case, the stack in this case has only 10 frames, so the difference is not that big. In addition, some of the difference is due to design decisions: allow arbitrary `return`s from functions and you need to decorate function frames as the point to return from, etc. Whether such "features" are worth the cost or not is a subjective debate. – Eli Barzilay Nov 09 '13 at 20:52
1

@EliBarzilay You have a point, I removed the part about frames. But that the stack is only 10 frames large is not important. There are a lot of function *calls* in OP's code, and function calls are optimized a lot in Lisps (necessarily) while their performance in Python is not considered quite as important. Frames are a bad example for this, the complicated dance the interpreter has to do to match arguments and keyword arguments with the function's named parameters, varargs, and `**kwds` is a more likely source of slowness. – Nov 09 '13 at 20:55
1

@delnan: Yeah, these kind of things are likely to be a much bigger part of the cost, but you should also consider that Racket has all of these gadgets too. Another source of cost is if you use only methods and therefore pay the price of dispatch on every call. When you use a CL, for example, you can see significant performance hits if you switch from functions to generic functions. – Eli Barzilay Nov 10 '13 at 00:14

score -5 · Answer 4 · answered Nov 09 '13 at 15:13

-5

I think that rather than Python being slow itself, that it is the Python interpreter moving through the code at a slower rate. If you tried to compile the code with a tool such as py2exe then it may be quicker than lisp. You'd have to try it, but I think it just has a slow interpreter.

answered Nov 09 '13 at 15:13

Jake Nixon

93
1
11

1

Python is defined by the CPython interpreter though. – ithisa Nov 09 '13 at 15:15
Also, I tried pypy but it just optimizes the whole loop away because it does not have any visible side effects. – ithisa Nov 09 '13 at 15:18
@user54609, then try `q += factorial(10)` and `print q` at the end – finnw Nov 09 '13 at 15:55
7

I strongly doubt that py2exe will eliminate the difference. Interpretation overhead is measurable, but nowhere near that huge, and a Lisp interpreter would also have interpretation overhead (though the exact value would of course be different). Actually I don't think py2exe removes the interpretation at all, it just bundles your Python code and the CPython code with some C code that does the equivalent of `exec(open(module).read())`. Try Cython or Nuitka instead to measure the interpretation overhead, those compile Python(-ish) code to C API calls. – Nov 09 '13 at 16:01

Why are Python and Ruby so slow, while Lisp implementations are fast?

4 Answers4