Level of Parallelism present in multiple threads per core

Question

So i have been looking into some of the technologies that implement multiple threads per core (like intel's hyperthreading) and I am wondering whats the extent of parallelism in these kinds of technologies. Is it true parallelism or just more effective concurrency? It seems they still share the same execution units and core resources, basically seems like its just virtualizing the usage. So I am unsure how true parallelism could occur. And if this is the case then what is the benefit? You can achieve concurrency through effective thread context switching.

dcastro · Answer 1 · 2014-03-01T20:42:48.090

2

I'm no expert, but from what I've read (Long Duration Spin-wait Loops on Hyper-Threading Technology Enabled Intel Processors):

Each physical processor has two logical processors. The logical processors each have their own independent architectural state, but share nearly all other resources on the physical processor, such as caches, execution units, branch predictor, control logic and buses.

So, basically, if one logical processor is using a physical unit (e.g., FPU, the Floating-point unit), the other logical processor is allowed to use another resource (e.g., ALU, the arithmetic logic unit).

From what I've read, you can expect a performance increase of 15-20% best case scenario. I don't have any actual numbers, but don't expect the same level of performance increase as you'd expect from adding another physical processor.

edited Mar 01 '14 at 20:42

answered Mar 01 '14 at 20:04

dcastro

66,540
21
145
155

Thanks for the response. I don't really understand why that would be useful, with dynamic scheduling and out of order execution I would imagine that most of the resources present would be utilized. Furthermore, I would expect the overhead incurred for monitoring resource utilization would marginalize the benefits. – PandaRaid Mar 01 '14 at 20:38
@PandaRaid I admit I don't know much about the details, but it seems modern processors really do have a huge surplus of resources, so much that it became perfectly feasible to have 2 threads running on the same CPU simultaneously. Here's another good explanation: http://www.tomshardware.co.uk/answers/id-1694436/hyperthreading-work.html#. – dcastro Mar 01 '14 at 20:55
@PandaRaid "I would imagine that most of the resources present would be utilized" - not if there's more than one resource of a given kind (e.g., multiples ALUs) – dcastro Mar 01 '14 at 21:08
Even if there are multiple functional units, they are still being utilized. Most modern processors use out of order execution, dynamic scheduling and multiple issue which requires either pipelined or duplicated hardware. – PandaRaid Mar 01 '14 at 21:41
1

Don't forget that these days a computer's RAM system is quite a bit slower than its CPU. L1, L2, L3 caches help, but sometimes a thread needs to access a piece of memory that is not present in any of the caches, and there is nothing for the thread to do except wait until the required data is fetched from RAM, which can take several thousand cycles. With hyper threading (and another thread that's ready to run) the CPU can simply put that thread aside and let another one run until the data gets to the CPU -- without hyper threading, the CPU would be idle during that period. – Jeremy Friesner Mar 03 '14 at 05:49
@JeremyFriesner How would that be any different from a thread context switch? The processor becomes idle while thread waits for data, so it is preempted until the data arrives and then it resumes operation. Or out of order execution, where the operands are not ready so it lets the next instruction execute while the data is being buffered. – PandaRaid Mar 03 '14 at 14:56
No difference, it's just another reason why the CPU might do a thread context switch. (Unless by "thread context switch" you mean the OS-level version of a context switch, which is a different thing that is not handled directly/entirely within the CPU) – Jeremy Friesner Mar 03 '14 at 16:16
Exactly, context switching is a problem handled both by modern OS and hardware. So i don't get the effectiveness of adding an extra "logical" processor per core for concurrent thread execution when it is something that is already addressed. Seems like added overhead for no benefit, in fact a lot of things I have read make it seem as though theres a performance penalty in many cases. – PandaRaid Mar 03 '14 at 16:46

score 1 · Accepted Answer · answered Mar 04 '14 at 16:49

So there are a lot of factors that determine the benefits present in Hyperthreading. First off since they are sharing resources there is obviously no true parallelism but their is some increase in concurrency depending on the type of processor.

There are three types of hardware threading. Fine grained which switches threads in a round robin fashion with the goal of increased throughput, at the cost of increased individual thread latency. Switching is done on a clock to clock basis. There is course grained which is more like a context switch, where the processor switching the thread when a stall or some sort of memory fetching occurs. Then there is simultaneous in which thread switching occurs in the same clock, meaning there is multiple thread data in the Reorder Buffer and Pipeline at the same time. They are depicted as follows.

Shades correspond to threads of execution

Hyperthreading corresponds to the SMT in this diagram. And as seen, the effectiveness of the design depends primarily on one thing: how busy the pipeline is. In dynamically scheduled processors, where the goal is to keep the pipeline and execution units as busy as possible, the advantages see diminishing returns of around 0 to 5 percent from what I have seen. For statically scheduled processors, where the pipeline has a lot of stalls the benefits are much more prevalent and see gains of around 20 to 40% depending on the capabilities of the compiler reordering the instructions.

Level of Parallelism present in multiple threads per core

2 Answers2