0

I have a simple project to build a native wrapper for the Eigen library.

However I do not see any speedup compared to compiling the same library without MKL.

Can anyone help?

I define EIGEN_USE_MKL_ALL and I link the additional libraries mkl_core.lib, mkl_intel_thread.lib, mkl_intel_lip64.lib and libiomp5md.lib as in the following answer

Ben238
  • 98
  • 7
  • What's your CPU? What's your test case? – Homer512 Mar 27 '23 at 17:32
  • Hi @Homer512 I am running my little benchmark on an intel proc and it is just a 500x500 dense matrix on which I perform a full LU decomposition then a solve. I have one library set up to compile without MKL while the other one has it. The later does not perform faster when I would have expected it to do so. – Ben238 Mar 27 '23 at 17:53
  • 1
    Eigen doesn't dispatch full PIV LU to MKL or LAPACK. I haven't checked whether that's offered by those libraries. If you switch to partialPivLU, you should see a speedup. On my system MKL is a factor of 5 faster in this test setup – Homer512 Mar 27 '23 at 19:39
  • @Homer512 please put your comment as answer and I will accept it. – Ben238 Mar 28 '23 at 10:14

1 Answers1

1

Eigen doesn't dispatch full piv LU to MKL or LAPACK. I haven't checked whether that's offered by those libraries. If you switch to partialPivLU, you should see a speedup. On my system MKL is a factor of 4-5 faster in this test setup.

#include <Eigen/Dense>


int main()
{
  const Eigen::MatrixXd A = Eigen::MatrixXd::Random(500, 500);
  const Eigen::VectorXd b = Eigen::VectorXd::Random(500);
  const int rep = 200;
  Eigen::VectorXd c;
  for(int i = 0; i < rep; ++i)
    c = A.LU_METHOD().solve(b);
}

Partial Piv:

g++ -O3 -I/usr/include/eigen3 -march=native -fopenmp \
    -DLU_METHOD=partialPivLu -DEIGEN_USE_MKL_ALL -I${MKLROOT}/include \
    -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 \
    -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl test.cpp
time ./a.out

> real    0m0,221s
> user    0m1,513s
> sys     0m0,019s

g++ -O3 -I/usr/include/eigen3 -march=native -fopenmp \
    -DLU_METHOD=partialPivLu test.cpp
time ./a.out

> real    0m0,995s
> user    0m5,719s
> sys     0m3,626s

Full Piv:

g++ -O3 -I/usr/include/eigen3 -march=native -fopenmp \
    -DLU_METHOD=fullPivLu -DEIGEN_USE_MKL_ALL -I${MKLROOT}/include \
    -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 \
    -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl test.cpp
time ./a.out

> real    0m4,445s
> user    0m4,438s
> sys     0m0,006s

g++ -O3 -I/usr/include/eigen3 -march=native -fopenmp \
    -DLU_METHOD=fullPivLu test.cpp
time ./a.out

> real    0m4,510s
> user    0m4,505s
> sys     0m0,005

Tested on Intel i7-11800H. Reading the assembly verifies that no calls to Lapack functions are made in the full piv case while the partial piv calls LAPACKE_dgetrf.

Homer512
  • 9,144
  • 2
  • 8
  • 25