OpenMP memory access and performance degradation

Question

I have a program which takes advantages of OpenMP for obtaining a great speed up on a dual CPU with a total of 32 cores server. The input parameters which I'm using doesn't allow for complete loading of the CPUs.

Today a couple of cores were 100% loaded by another program. When I launched my program it was terribly slow even if the load on the CPUs was as usual pretty high (~2500%). I removed the parallel instructions and I noticed some performance improvements.

Can this been due to the limited memory bandwidth? How could I further investigate the issue and eventually improve my code?

it can be memory issue or because "other program" load some CPU, but show your code and perform a real measure of how much memory/CPU, execution time your program use! — alexbuisson, Jul 29 '13 at 16:47

score 2 · Accepted Answer · answered Jul 29 '13 at 22:52

It is not necessarily memory access that degrade performance. If you use static scheduling (often the default), loops are divided into chunks that are assigned to threads. If the threads are bound to a core which is already busy, it will dramatically slow down your runtime performance. If you are running in an environment where you are not guaranteed to be the only user of the resources, you may get better performance with dynamic scheduling.

If you did not specify a scheduling type, run your program with

OMP_SCHEDULE=dynamic  ./my_program

and see if it helps.

Thanks for the reply, that sound interesting. (un)fortunately today the server is pretty much unloaded. I will try as soon as I have the opportunity. — DarioP, Jul 30 '13 at 09:17

OpenMP memory access and performance degradation

1 Answers1