I'm trying to transfer some code I've previously written in python into C++, and I'm currently testing xtensor to see if it can be faster than numpy for doing what I need it to.
One of my functions takes a square matrix d and a scalar alpha, and performs the elementwise operation alpha/(alpha+d)
. Background: this function is used to test which value of alpha
is 'best', so it is in a loop where d
is always the same, but alpha
varies.
All of the following time scales are an average of 100 instances of running the function.
In numpy, it takes around 0.27 seconds to do this, and the code is as follows:
def kfun(d,alpha):
k = alpha /(d+alpha)
return k
but xtensor takes about 0.36 seconds, and the code looks like this:
xt::xtensor<double,2> xk(xt::xtensor<double,2> d, double alpha){
return alpha/(alpha+d);
}
I've also attempted the following version using std::vector
but this something I do not want to use in long run, even though it only took 0.22 seconds.
std::vector<std::vector<double>> kloops(std::vector<std::vector<double>> d, double alpha, int d_size){
for (int i = 0; i<d_size; i++){
for (int j = 0; j<d_size; j++){
d[i][j] = alpha/(alpha + d[i][j]);
}
}
return d;
}
I've noticed that the operator/
in xtensor uses "lazy broadcasting", is there maybe a way to make it immediate?
EDIT:
In Python, the function is called as follows, and timed using the "time" package
t0 = time.time()
for i in range(100):
kk = k(dsquared,alpha_squared)
print(time.time()-t0)
In C++ I call the function has follows, and is timed using chronos:
//d is saved as a 1D npy file, an artefact from old code
auto sd2 = xt::load_npy<double>("/path/to/d.npy");
shape = {7084, 7084};
xt::xtensor<double, 2> xd2(shape);
for (int i = 0; i<7084;i++){
for (int j=0; j<7084;j++){
xd2(i,j) = (sd2(i*7084+j));
}
}
auto start = std::chrono::steady_clock::now();
for (int i = 0;i<10;i++){
matrix<double> kk = kfun(xd2,4000*4000,7084);
}
auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "k takes: " << elapsed_seconds.count() << "\n";
If you wish to run this code, I'd suggest using xd2
as a symmetric 7084x7084 random matrix with zeros on the diagonal.
The output of the function, a matrix called k
, then goes on to be used in other functions, but I still need d
to be unchanged as it will be reused later.
END EDIT
To run my C++ code I use the following line in the terminal:
cd "/path/to/src/" && g++ -mavx2 -ffast-math -DXTENSOR_USE_XSIMD -O3 ccode.cpp -o ccode -I/path/to/xtensorinclude && "/path/to/src/"ccode
Thanks in advance!