why do pipeline constraints of Coarse-grained multithreading and Fine-grained multithreading differ?

Question

In "Computer Organization and Design: The Hardware/ Software Interface, Sixth Edition" RISCV Edition by David A. Patterson and John L. Hennessy chapter 6.4, it says about "coarse-grained multithreading":

This change relieves the need to have thread switching be extremely fast and is much less likely to slow down the execution of an individual thread, since instructions from other threads will only be issued when a thread encounters a costly stall.

Because a processor with coarse-grained multithreading issues instructions from a single thread, when a stall occurs, the pipeline must be emptied or frozen. The new thread that begins executing after the stall must fill the pipeline before instructions are able to complete.

But about "Fine-grained multithreading", it doesn't refer to changes to pipeline when switching threads:

This interleaving is often done in a round-robin fashion, skipping any threads that are stalled at that clock cycle.

Q: Since the book says:

A thread includes the program counter, the register state, and the stack.

and both categories of multithreading begins switching threads when encountering stalls, why must Coarse-grained multithreading need pipeline be empty because pipeline instruction source is only from a single thread and then fill the pipeline but "Fine-grained multithreading" not?

Peter Cordes · Accepted Answer · 2023-07-21T18:40:37.627

2

I think the point is that if you're going to have two sets of register state, page tables, FP exception state, etc. that can be active at once, you might as well do fine-grained multithreading.

So it wouldn't be a good tradeoff to make a coarse-grained multithreading CPU that paid most of the cost to support fine-grained multithreading. In this paragraph at least, that looks like an unstated assumption, but perhaps they discuss it elsewhere.

The benefit of only doing coarse-grained multithreading this way is that you don't need to support having instructions from different contexts in the pipeline at once, simplifying things such as FP exceptions and rounding mode to not need to be per-instruction.

Architectural state for the thread being swapped out can get saved to special storage that's only accessed by the hardware-context-switching logic, instead of extra tag bits in a bunch of things, and a RAT with twice as many entries.

(As Dr. Bandwidth comments, fine-grained multithreading is usually only used in CPUs with out-of-order exec and register renaming.)

edited Jul 21 '23 at 18:40

answered Jul 20 '23 at 23:46

Peter Cordes

328,167
45
605
847

Thanks for the quick reply. Do you mean that if the processing unit only has **one** set of register files, page tables, etc., then fine-grained multithreading also needs **empty** the pipeline ? – zg c Jul 20 '23 at 23:50
1

@zgc: In that case you can't do fine-grained multithreading. Because yes, you couldn't have instructions from different logical cores in the pipeline at once, if the pipeline can't track which context they need to execute in. And flushing the pipeline every instruction totally defeats pipelining. – Peter Cordes Jul 20 '23 at 23:53
1

Fine-grain multithreading almost always uses register renaming rather than multiple sets of register files. The hardware register renamer will ensure that each thread context only accesses its own register values. Fine-grain multi-threading in a pipelined processor brings some additional complexities that one might not immediately think of. For example, every floating-point instruction has to carry its own floating-point control state (rounding mode, interrupt mode) through each stage of the pipeline. – John D McCalpin Jul 21 '23 at 16:37
1

@JohnDMcCalpin: Right, yes, multiple register *files* sounded wrong while I was writing it, except for CPUs with a retirement-register-file like P6-family. Rephrased to what I really meant. And good point about FP rounding mode needing to be tracked per-instruction with fine-grained multithreading, that's a great example of the kind of thing that could be handled by draining the pipeline on change to avoid that if you weren't doing SMT. – Peter Cordes Jul 21 '23 at 18:42

why do pipeline constraints of Coarse-grained multithreading and Fine-grained multithreading differ?

1 Answers1