2

I have a question which I tried to find an answer for, but got more confused from all the information I found - unfortunately, couldn't gain a clear answer

So, let's say I have a coomputer with hyperthreading turned OFF.

What is the optimal number of threads I should use at a program I wrote?

I understand that if my program is NOT 100% CPU bound (deals with IO), so the optimal number of threads will be MORE than one thread per core - since I will have multi threads which are waiting, and having more (not too much due to context switching overhead) will be better for these kind of programs.

BUT, In case my program is 100% CPU bound - one thread per core is the optimal? I'm confused since having more threads, meaning maybe getting a bigger slice time for each thread - which can improve the performance.

Thanks!

maor levi
  • 39
  • 1

1 Answers1

0

With purely CPU-bound loads without hyper-threading the answer always is 1 thread per core.

With HT turned on it can be less than one thread per HT-core because the threads fight over the same cache. But usually, even here one thread per HT core is best.

With IO workloads it's much more complicated but this does not apply here.

since having more threads, meaning maybe getting a bigger slice time for each thread

Not sure I follow the reasoning. The OS will hand out time slices to threads approximately in a round robin way. Time slices are 4-40ms and their size does not change depending on the count of threads.

Ideally, when the number of threads is exactly right, there are no context switches to speak of. The more threads you add the more context switches will there be.

usr
  • 168,620
  • 35
  • 240
  • 369
  • OK thanks! about the bigger slice time, it was misunderstanding of my. You say that if the HT is on 1 thread per core is usually the best? I would expect it to be 2 actually (when 2 threads can run together).Otherwise, if the shared cache problem is too big, why to use HT at all? – maor levi Apr 12 '16 at 12:54
  • If you have 4x2 cores, run 8 threads (usually). The shared cache works fine, it's just that it needs to accommodate the data of two threads now. This can cause no issues at all or it can lead to degradation. I think degradation is a rare edge case. The safest way is, of course, to try both approaches and benchmark. You can even run an automated 5 second benchmark on customer machines. – usr Apr 12 '16 at 12:59