-4

I've heard that C++ is much faster than Python and I test that using pybind11.

I created two analogous classes, one in C++ (Foo) and one in Python (Bar). I then use pybind11 to use C++ class in Python.

The main idea of the class is that it should contain a custom class and calculate it for all integer numbers from 0 to x. I set here x to one million to see an impact in runtime.

main.cpp

#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <pybind11/operators.h>
#include <pybind11/stl.h>

using namespace std;

class Foo {
    private:
        function<double(int)> func;
        double result[1000000];
    public:
        Foo(function<double(int)> f): func(f) {}

        void calculate() {
            for (int t=0; t<1000000; t++) {
                result[t] = func(t);
            }
        }
};


PYBIND11_MODULE(cmake_example, m) {
    pybind11::class_<Foo>(m, "Foo")
        .def(pybind11::init<function<double(int)>>())
        .def("calculate", &Foo::calculate);

    m.attr("__version__") = "0.0.1";
}

mytest.py

import time
import cmake_example


class Bar:
    def __init__(self, func):
        self.func = func
        self.result = [None for _ in range(1_000_000)]

    def calculate(self):
        for t in range(1_000_000):
            self.result[t] = self.func(t)


def myfunction(t):
    return t * t


if __name__ == "__main__":
    start = time.time()
    foo = cmake_example.Foo(myfunction)
    foo.calculate()
    print("cpp:", time.time() - start)

    start = time.time()
    bar = Bar(myfunction)
    bar.calculate()
    print("pyt:", time.time() - start)

and the result is:

cpp: 0.4998281002044678
pyt: 0.39750099182128906

I'm wondering - would other data types solve the issue? Why is C++ slower in that case than Python?

I expected that the C++ code will be faster.

  • 4
    Im far from an expert here, but just to begin hazarding a guess, Id imagine you're kinda kneecapping c++ by calling it from python to begin with. Also, note that pythons interpreter itsself is written in (likely well optimized) C, so for simple functions where the things that slow python down like garbage collection don't have to come into play as much. I kinda want to say the way you test overall is like comparing apples to oranges that have been put through a blender together – Douglas B Aug 01 '23 at 15:14
  • 4
    One thing that trips up many folks wondering why X is faster than Y is the failure to compile their C++ code with optimization on. May or may not make a difference in this case since most of the time is probably the transitions between c++ and python. – Retired Ninja Aug 01 '23 at 15:41
  • @RetiredNinja, how do you "compile C++ code with optimization on"? – zchmielewska Aug 01 '23 at 16:24
  • Implement a "proper" extension module instead of passing through the abstraction layers of pybind. – molbdnilo Aug 01 '23 at 17:06
  • @molbdnilo, what is a "proper" extension module? – zchmielewska Aug 01 '23 at 17:09
  • @zchmielewska Depends on the compiler you are using. The options are usually well documented. – Retired Ninja Aug 01 '23 at 17:32
  • 2
    @zchmielewska *how do you "compile C++ code with optimization on"?* -- Any question that asks "why is my C++ code slower than..." **must** be accompanied by the compiler, and the compiler options used to build the program. Those options include the optimization settings. If you are running a "debug" or unoptimized build, then the timing measures you are reporting are meaningless. Build the C++ module with optimizations turned on, and *then* run your benchmarks. Too many times, questions concerning C++ speed are closed by the original poster, all due to not running an optimized build. – PaulMcKenzie Aug 01 '23 at 18:48

1 Answers1

3

You are calling a python function a million times in both cases, and that's the expensive part. In the python case, the byte code executor already has all of its data in python objects. In the C++ case, each call must go through an intermediate shim that builds and destroys the python equivalent of the C objects. That intermediate shim is what is killing performance.

Implement myFunction in C++ and call it natively in the C++ case, then you'll see it fly. And that's the trick to optimizing python with C++. You have to pick parts of the code that execute often but can be implemented without fiddling around with python data types, except at the edges of the code.

You could also try other python compilers like cython and mypy, which would give you better performance.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Yes, indeed, a function in C++ would make a big difference (runtime=0) but that's the part I need to have in Python. – zchmielewska Aug 01 '23 at 16:27
  • You are essentially "optimizing" the for loop in C++, and as said in the answer suffer additional overhead because you call a Python function from C++ code. To make it worse, you can even speed up the python code with a list comprehension `result = [myfunction(i) for i in range(1_000_000)`. On my system this is 35% faster than the for loop. Even faster would be a `list(map(myfunction, range(1000000))` with almost 49% speedup. Just creating the result list consumes ~40% of the remaining runtime, and I guess the rest is function call overhead because the function is just one multiplicaiton. – Jens Aug 01 '23 at 21:01
  • @jens - Interesting numbers and worth updating the code. But as `myFunction` gets more complicated and expensive to run in real world code, the loop in C++. the original or your updated python will make little overall difference. – tdelaney Aug 01 '23 at 21:52
  • 1
    @tdelaney Yes, this is what I wanted to point out with these tests. In the benchmark code, a lot of time spent is just loop logic and list construction of the result list. The only thing the C++ code does it to implement a loop calling a python function, so you optimize a very small part of the code, which looks almost good right now because of the way the benchmark is constructed. – Jens Aug 02 '23 at 15:22