I use a simulation written in python/numpy/cython. Since i need to average over many simulation runs i use the multiprocessing module to run all the individual simulation runs in batches.
At the office i have an i7-920 workstation with HT. At home i have an i5-560 without. I thought i could run twice as many instances of the simulation in each batch in the office and cut my running time in half. Surprisingly, the run time of each individual instance was doubled compared to the time it take on my home workstation. That it, running 3 simulation instances in parallel at home would take, say 8 minutes, while running 6 instances at the office take about 15 minutes. Using 'cat /proc/cpuinfo' i verified 'siblings' = 8 and 'cpu cores' = 4, so HT is enabled.
I am not aware of any "conservation of total runtime" law (though from s scientific point of view it could quite interesting :) ), and hopping someone here might shed some light on this conundrum.