I am trying to estimate how good is Python performance comparing to C++.
Here is my Python code:
a=np.random.rand(1000,1000) #type is automaically float64
b=np.random.rand(1000,1000)
c=np.empty((1000,1000),dtype='float64')
%timeit a.dot(b,out=c)
#15.5 ms ± 560 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And here is my C++ code that I compile with Xcode in release regime:
#include <iostream>
#include <Dense>
#include <time.h>
using namespace Eigen;
using namespace std;
int main(int argc, const char * argv[]) {
//RNG generator
unsigned int seed = clock();
srand(seed);
int Msize=1000, Nloops=10;
MatrixXd m1=MatrixXd::Random(Msize,Msize);
MatrixXd m2=MatrixXd::Random(Msize,Msize);
MatrixXd m3=MatrixXd::Random(Msize,Msize);
cout << "Starting matrix multiplication test with " << Msize <<
"matrices" << endl;
clock_t start=clock();
for (int i=0; i<Nloops; i++)
m3=m1*m2;
start = clock() - start;
cout << "time elapsed for 1 multiplication: " << start / ((double)
CLOCKS_PER_SEC * (double) Nloops) << " seconds" <<endl;
return 0;
}
And the result is:
Starting matrix multiplication test with 1000matrices
time elapsed for 1 multiplication: 0.148856 seconds
Program ended with exit code: 0
Which means that C++ program is 10 times slower.
Alternatively, I've tried to compile cpp code in MAC terminal:
g++ -std=c++11 -I/usr/local/Cellar/eigen/3.3.5/include/eigen3/eigen main.cpp -o my_exec -O3
./my_exec
Starting matrix multiplication test with 1000matrices
time elapsed for 1 multiplication: 0.150432 seconds
I am aware of very similar question, however, it looks like there the issue was in matrix definitions. In my example I've used default eigen functions to create matrices from uniform distribution.
Thanks, Mikhail
Edit: I found out, that while numpy uses multithreading, Eigen does not use multiple threads by default (checked by Eigen::nbThreads()
function).
As suggested, I used -march=native
option which reduced computation time by a factor of 3. Taking into account 8 threads available on my MAC, I can believe that with multithreading numpy runs 3 times faster.