Running simulation with hyperthreading doubles runtime

Question

I use a simulation written in python/numpy/cython. Since i need to average over many simulation runs i use the multiprocessing module to run all the individual simulation runs in batches.

At the office i have an i7-920 workstation with HT. At home i have an i5-560 without. I thought i could run twice as many instances of the simulation in each batch in the office and cut my running time in half. Surprisingly, the run time of each individual instance was doubled compared to the time it take on my home workstation. That it, running 3 simulation instances in parallel at home would take, say 8 minutes, while running 6 instances at the office take about 15 minutes. Using 'cat /proc/cpuinfo' i verified 'siblings' = 8 and 'cpu cores' = 4, so HT is enabled.

I am not aware of any "conservation of total runtime" law (though from s scientific point of view it could quite interesting :) ), and hopping someone here might shed some light on this conundrum.

Maybe the [GIL](http://en.wikipedia.org/wiki/Global_Interpreter_Lock) is biting you? — Fred Larson, Dec 07 '11 at 14:01
On second thought, if you're using the multiprocessing module, I don't think the GIL should be an issue. — Fred Larson, Dec 07 '11 at 14:15
You're comparing apples and bananas. You should compare the performance of 3 instances in parallel and 6 instances in parallel on the same (hyper-threaded) CPU. This will give you a better understanding of whether hyperthreading helps or does not. — Seun Osewa, Dec 25 '13 at 14:03

score 5 · Answer 1 · edited Sep 16 '15 at 11:45

Hyperthreading may be good for some kinds of workload. Intense numeric computations is not one of these - when you want to do some number crunching you better turn off hyperthreading. What hyperthreading gives one is "free context switching" between tasks, but the CPU has only so many execution units.

In this case, it can make things worse, because the O.S. can't know which processes are running on separate cores (where they'd get full performance), and which are on the same core, just on different "hyperthreads".

(Actually, I'd bet the Linux kernel can provide a way for one to have fine control over that, but Python's multiprocessing module will just launch extra-processes which will use default resource allocation).

Bottomline: turn HT off if you can - at least you will make full use of the 4 cores.

Hmmm, so i guess there is a "conservation of total runtime" law. Sort of. Thanks for the tip. — Mickey Diamant, Dec 07 '11 at 16:43

score 3 · Accepted Answer · answered Dec 07 '11 at 14:04

3

Maybe the context switches produce more overhead, caused by 6 massivly calculating processes and only 4 real cores. If the processes compete for the cpu-ressources, they may use inefficient the cpu-caches.

If you only enable 4 instead of 6 core, what's the result?

answered Dec 07 '11 at 14:04

Themerius

1,861
1
19
33

With HT eanbled in the BIOS, running 3 simulations simultaneously gave only a slight improvement, but still 35% slower than the runtime on my home i5. Maybe i should turn HT off. – Mickey Diamant Dec 07 '11 at 20:14

score 1 · Answer 3 · answered Jan 21 '12 at 10:39

The others have pretty much given you an insight on the problem, I just want to contribute by linking this article that explains a bit more about how HT works and what are the implications for the performance of a multithreaded program: http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/

score 0 · Answer 4 · edited Sep 16 '15 at 11:45

0

with my HP workstation(16 cores/cpu,using hyper-threading comes to 32 processors), turning hyper-threading on even broke python when I run the numerical simulation,the error code is 0x000005 this puzzled me a long time until I turned HT off,and the simulation works well! maybe you could check and compare the run-time for both HT is on and off

edited Sep 16 '15 at 11:45

Mike McKerns

33,715
8
119
139

answered Sep 16 '15 at 09:10

Heng Zhang

3
3

Running simulation with hyperthreading doubles runtime

4 Answers4

Linked