-2

I have a matrix class in c++ and want to iterate over all pixels, performing some calculations depending on the position in python. I use a callback function to pass in the logic of the calculation from python (using pybind11).

I have setup a simplified example, which just calculates the sum of all pixel values with different approaches, comparing the performance. You can find the whole code here on gitlab:

mymatrix.h

using Callback = std::function<void(float)>;

class MyMatrix
{
    int cols, rows;
    std::vector<float> data;

public:
    MyMatrix();
    float getSum() const;
    float getValue(int i, int j) const;
    void calculate(Callback cb) const;
};

mymatrix.cpp

MyMatrix::MyMatrix()
{
    cols = 1000;
    rows = 1000;
    data.reserve(cols * rows);
    for(int i=0; i<cols; ++i)
    {
        for(int j=0; j<cols; ++j)
        {
            data.push_back(1);
        }
    }
}
float MyMatrix::getSum() const
{
    float rv = 0;
    for(int i=0; i<cols; ++i)
    {
        for(int j=0; j<cols; ++j)
        {
            rv += data[i*j+j];
        }
    }
    return rv;
}
float MyMatrix::getValue(int i, int j) const
{
    return data[i*j+j];
}
void MyMatrix::calculate(Callback cb) const
{
    for(int i=0; i<cols; ++i)
    {
        for(int j=0; j<cols; ++j)
        {
            cb(data[i*j+j]);
        }
    }
}

main.py

class MyCallbackClass:
    def __init__(self):
        self.summ = 0
    def callback_py(self, ival):
        self.summ += ival

callbackObj = MyCallbackClass()
def callback_fun(ival):
    callback_fun.sum += ival
callback_fun.sum = 0

def callback_fun_do_nothing(ival):
    return


my_matrix = module_name.PyMatrix()

t0 = time.time()
getSum_value = my_matrix.getSum()
t1 = time.time()
py_sum = 0
for i in range(1000):
    for j in range(1000):
        py_sum += my_matrix.getValue(i, j)
t2 = time.time()
my_matrix.calculate(callbackObj.callback_py)
t3 = time.time()
my_matrix.calculate(callback_fun)
t4 = time.time()
my_matrix.calculate(callback_fun_do_nothing)
t5 = time.time()

On my laptop I get the result below:

description calculated value time spent [sec]
pure cpp (calling getSum()) 1000000.0 0.0020
sum in python by calling getValue() 1000000.0 0.8790
using callback object class 1000000.0 0.2910
using callback function 1000000.0 0.2270
using callback function doing nothing - 0.1330

The getValue() approach is more than 400x slower than pure c++. The callback function is faster and is ~100x slower then pure c++.

When I want to perform calculations based on the pixel position and value, is there any approach which will be faster then the example with the callback function? The restriction is that the code for the calculation is only known at python side, after the c++ code was compiled. So it is not an option do just implement the calculation in c++.

zboson
  • 121
  • 7
  • Iterating over all pixels is never fast. You have to make 1,000,000 function calls (and possibly 1,000,000 C/Python/C transitions). Your best best by far is to turn the image into a numpy array and use numpy's matrix methods. – Tim Roberts Aug 05 '23 at 18:39
  • The requirements "loop body needs to implemented in Python" and "it has to be fast" are not really compatible. You make things fast by avoiding the interpreter for all the heavy lifting. – Dan Mašek Aug 06 '23 at 17:12

0 Answers0