I have a matrix class in c++ and want to iterate over all pixels, performing some calculations depending on the position in python. I use a callback function to pass in the logic of the calculation from python (using pybind11).
I have setup a simplified example, which just calculates the sum of all pixel values with different approaches, comparing the performance. You can find the whole code here on gitlab:
mymatrix.h
using Callback = std::function<void(float)>;
class MyMatrix
{
int cols, rows;
std::vector<float> data;
public:
MyMatrix();
float getSum() const;
float getValue(int i, int j) const;
void calculate(Callback cb) const;
};
mymatrix.cpp
MyMatrix::MyMatrix()
{
cols = 1000;
rows = 1000;
data.reserve(cols * rows);
for(int i=0; i<cols; ++i)
{
for(int j=0; j<cols; ++j)
{
data.push_back(1);
}
}
}
float MyMatrix::getSum() const
{
float rv = 0;
for(int i=0; i<cols; ++i)
{
for(int j=0; j<cols; ++j)
{
rv += data[i*j+j];
}
}
return rv;
}
float MyMatrix::getValue(int i, int j) const
{
return data[i*j+j];
}
void MyMatrix::calculate(Callback cb) const
{
for(int i=0; i<cols; ++i)
{
for(int j=0; j<cols; ++j)
{
cb(data[i*j+j]);
}
}
}
main.py
class MyCallbackClass:
def __init__(self):
self.summ = 0
def callback_py(self, ival):
self.summ += ival
callbackObj = MyCallbackClass()
def callback_fun(ival):
callback_fun.sum += ival
callback_fun.sum = 0
def callback_fun_do_nothing(ival):
return
my_matrix = module_name.PyMatrix()
t0 = time.time()
getSum_value = my_matrix.getSum()
t1 = time.time()
py_sum = 0
for i in range(1000):
for j in range(1000):
py_sum += my_matrix.getValue(i, j)
t2 = time.time()
my_matrix.calculate(callbackObj.callback_py)
t3 = time.time()
my_matrix.calculate(callback_fun)
t4 = time.time()
my_matrix.calculate(callback_fun_do_nothing)
t5 = time.time()
On my laptop I get the result below:
description | calculated value | time spent [sec] |
---|---|---|
pure cpp (calling getSum()) | 1000000.0 | 0.0020 |
sum in python by calling getValue() | 1000000.0 | 0.8790 |
using callback object class | 1000000.0 | 0.2910 |
using callback function | 1000000.0 | 0.2270 |
using callback function doing nothing | - | 0.1330 |
The getValue()
approach is more than 400x slower than pure c++. The callback function is faster and is ~100x slower then pure c++.
When I want to perform calculations based on the pixel position and value, is there any approach which will be faster then the example with the callback function? The restriction is that the code for the calculation is only known at python side, after the c++ code was compiled. So it is not an option do just implement the calculation in c++.