1

I have a 4-core cpu and I am trying to optimize my code to reduce my calculation time on products of 2000x2000 Eigen matrix. Since I am using OpenMP I was expecting to reach 400% of CPU usage. But, for some reason I am stuck at 200%.

I am using Ubuntu 14.04. My code is written in C++. It uses the Eigen matrix library with OpenMP and the MKL. I compile my code with ICC with the following arguments: (this is an extract of my .pro file since I use Qt)

INCLUDEPATH += /opt/intel/mkl/include
LIBS += -L/opt/intel/mkl/lib/intel64 \
    -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core \
    -L/opt/intel/lib/intel64 \
    -liomp5 -lpthread -lm
DEFINES += NDEBUG
DEFINES += EIGEN_USE_MKL_ALL
QMAKE_CXXFLAGS_RELEASE += -fast -march=corei7 -qopenmp -static

How can I reach 400% of CPU usage? Thanks.


PS: EDIT What part of my code could be useful?

int nthreads = omp_get_num_threads();
cout << endl << nthreads << " thread(s) available for computation" << endl;
cout << Eigen::nbThreads() << " thread(s) used by Eigen" << endl;

this, for instance displays 1 thread available and 4 used by Eigen. Is this normal?

Arkhan
  • 99
  • 11
  • No such thing as 400% of CPU usage. I think you mean 100% usage of all cores of your CPU. Which isn't really possible because unless you circumvent your operating system's scheduler or use a real-time operating system. – RamblingMad Jul 31 '15 at 16:32
  • 1
    Well, @CoffeeandCode, you can get pretty close. If there is nothing else to do, you should be able to use up pretty well all of the available cores. – Martin James Jul 31 '15 at 16:37
  • @MartinJames your computer is doing a lot that you can't see even when you're doing nothing. My Ubuntu 15.04 uses at least 1% of all cores at idle. – RamblingMad Jul 31 '15 at 16:41
  • @CoffeeandCode: 400% is how GNU/Linux `top` would show the CPU usage for a program with 4 threads running full time. I think it's a clear way to say you want the `user` (cpu) time for your program to be as close as possible to 4x the `real` (wall-clock) time. The thread-level parallelism can be there, even if other processes keeps it from being fully exploited. – Peter Cordes Jul 31 '15 at 17:28
  • @CoffeandCode yes, what I meant is that I want `top` to show a 400% cpu usage (i.e. 100% of all 4 cores). I did succeed to reach 390% and more with another program and I would like to do the same. – Arkhan Jul 31 '15 at 18:07
  • @CoffeeandCode if I stop Firefox, (and all its polling for updates in the variuos tabs), and qBitTorrent uploads etc, the Task Manager shows 0% usage. If it was 1%, who cares - it's not very much, is it? The OP's app could have the other 99%. – Martin James Jul 31 '15 at 20:34
  • @Arkhan If you remove the `EIGEN_USE_MKL_ALL` flag, what happens? Do you get a) the same thing; b) slower; c) faster; d) higher CPU usage; e) lower CPU usage; f) a combination of the above? – Avi Ginsburg Aug 02 '15 at 12:04
  • @Arkhan Regarding the `int nthreads = omp_get_num_threads();` lines. Yes, that is normal. In a sequential region there is only one thread. See (here)[https://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fnum_005fthreads.html]. – Avi Ginsburg Aug 02 '15 at 12:06
  • @Arkhan You can check the version of Eigen by looking at the following defines: `EIGEN_WORLD_VERSION`, `EIGEN_MAJOR_VERSION` and `EIGEN_MINOR_VERSION`. The latest version as of this writing would be `3`, `2`, and `5` respectively. The defines are in the `./Eigen/src/Core/util/Macros.h` file (at least, they are now, they might have been moved at some point). – Avi Ginsburg Aug 02 '15 at 12:16
  • @AviGinsburg I) My Eigen version is 3.2.4. II) When I remove `EIGEN_USE_MKL_ALL` calculation goes 4 times slower, but `top` shows a 400% CPU usage... Which means, that `Eigen`+`MKL`+`OpenMP` is much more optimized but uses only 2 cores whereas `Eigen`+`OpenMP` has poor performance but uses all 4 cores. How can I combine advantages of both? – Arkhan Aug 02 '15 at 13:28

2 Answers2

3

There is no problem with top showing "only" a 200% CPU usage (instead of 400%).

In fact, my CPU has only 2 physical cores but hyperthreading allow them to have 2 logical cores each (4 logical cores in total). That is why when I use hyperthreading, top show sometimes a 400% CPU usage.

But Eigen+OpenMP+MKL do not use hyperthreading and do their own optimization (better than regular hyperthreading). The 200% CPU usage refers to the fact that both physical cores are used at 100% of their capacity not the logical cores.

Thus, Eigen+OpenMP+MKL is indeed much more efficient that Eigen+OpenMP. Thanks for your help.

Arkhan
  • 99
  • 11
  • Try setting the number of threads Eigen is allowed and try without `MKL`. You may see a speed improvement over the same version, but with hyperthreading. Or turn off hyperthreading on the machine altogether. – Avi Ginsburg Aug 02 '15 at 16:55
2

Since you have not provided any code snippet or other details, following are my observations based on your question details :-

Referring to Eigen's help on using MKL routines, following points have to be kept in mind :

  • For matrix-matrix multiplication your matrices must conform to certain datatypes. For any other data type, normal operations would follow disregarding any MKL specifications.
  • If you mix complex and real data types, MKL optimizations would not happen.
  • You have to define EIGEN_USE_MKL_ALL macro before including Eigen header files, in order to be truly able to use MKL.
  • Are your matrices dynamic ? If not, MKL optimizations would not take place.
  • Despite you specifying MKL optimizations, it might not be applied. Eigen would check for overheads involved in passing values to and from MKL routines and if the overhead is larger than computing without MKL they would not be applied. That is why Eigen mentions that MKL substitutions take place only with large enough and dense objects
  • Finally, if your Eigen version is any less than 3.1, MKL substitutions would not take place.

It would be better for you to check on these specifications and if you are convinced, then provide a code snippet.

Moreover, it is not always correlated with 400% CPU usage. Modern compilers carry out a lot of compiler level optimizations which vary from one compiler version to another. So, I would not be hell bent on looking directly at CPU Usage as a benchmark to decide how my program is running.

Ujjwal Aryan
  • 3,827
  • 3
  • 20
  • 31
  • All my matrices are dynamic matrices of float. They are all very large (at least 500x500) and dense. So they are supposed to be eligible for MKL optimizations, right? But how can I check if MKL optimization is actually used? And how can I find which version of Eigen am I using? (though I am quite sure I am using the latest one). – Arkhan Jul 31 '15 at 18:44
  • I just noticed on my task manager that 2 of my cores are used fully used (100%) while the two other are not used at all (0%)... Why? – Arkhan Aug 01 '15 at 09:31