0

I have some strange results while testing OpenMP. As a test case, I sum two vectors of floats, a problem which should perfectly parallelizable.

For vectors large enough on my quad-core CPU with Hyper-threading, which essentially means that I should have 4x2 independent threads, I get almost a perfect speedup of factor two from a single-thread execution to a dual-thread one. Same story if I go from 4 threads to 8 threads, relative speedup of factor 2.

However, I get almost no speedup going from 2 to 4 threads. I could understand if it happened during the transition from 4 to 8 threads, maybe because Hyper-threading technology of pushing two logical threads into one physical core was imperfect. But on this intermediate stage it seems strange to me.

I would grateful for any ideas!

  • 3
    Difficult to answer without code, version of OpenMP, .... – Gerriet May 09 '17 at 14:34
  • 2
    Most OpenMP implementations allow one to control how (and if) the threads are placed and bound to the CPUs on the system. When those mechanisms are not activated, the OS is free to move the threads wherever and whenever the scheduler deems fit. Without knowing which case is yours, one could only make wild guesses what the reason might be. – Hristo Iliev May 09 '17 at 15:09

0 Answers0