I'm running a large set of simulations on a quad-core Xenon E5520 with Hyper-Threading enabled. My software automatically detects 8 (virtual) cores and launches 8 simulations to run in parallel. However htop and system-monitor only show each of the 8 cores as loaded to ~50%.
Is this intended behavior? In a way, it makes sense since the total load would be 400% or 100% for each physical core, but shouldn't I get a bit more than that? I mean that's the purpose of HT right? Use SMT to use the otherwise unused execution units to run another thread. So throughput should be higher right?
I should mention that the load is very consistent, 50% on each core, all the time. The simulations are ran by Java, in a single JVM, the GC is not the issue, I'm way below the JVM heap limit. The simulations are not bound by memory, there is plenty to go around and no swapping. The simulations are writing a lot of data to disk but there are large buffers in place (128MB write buffer for each thread) and the disk activity as shown by gkrellm is frequent bursts of ~90MB/s but it's not a consistent load and I can't believe this could be a bottleneck.
Could any one shed some light on this?