1

I am reading Agner Fog's materials and I have some doubts:

The pre-decoders and decoders can handle 16 bytes or 4 instructions per clock cycle

  1. What is pre-decoders in context of decoders?
  2. The author says about cache for macroinstruction. I cannot why it can be useful, after all, we have cache instruction. What is loopback buffer?

  3. What is microoperationsFusion and macroOperationFustion?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Gilgamesz
  • 4,727
  • 3
  • 28
  • 63

1 Answers1

5
  1. "The pre-decoder will find and mark the instruction boundaries, decode any prefixes and check for certain properties (e.g. branches)." (Source) (Another article)

  2. The L1 instruction cache is the main cache for macro-instructions. A loop buffer stores a small sequence of macro-instructions (like 32 bytes) that is useful for tight loops, which saves latency and power compared to reading from the L1 cache.

  3. "The register renaming (RAT) and retirement (RRF) stages in the pipeline are bottlenecks with a maximum throughput of 3 μops per clock cycle. In order to get more through these bottlenecks, the designers have joined some operations together that were split in two μops in previous processors. They call this μop fusion. The fused operations share a single μop in most of the pipeline and a single entry in the reorder buffer (ROB). But this single ROB entry represents two operations that have to be done by two different execution units. The fused ROB entry is dispatched to two different execution ports but is retired as a single unit." (Source)

    Macro-op fusion is a way to recognize a sequence of macro instructions that become one micro-op. The most common example is that on newer Intel CPUs, a CMP + JMP fuses into one micro-op.

Nayuki
  • 17,911
  • 6
  • 53
  • 80
  • 2
    3. that's for Pentium M, which doesn't have a uop cache, or a loop buffer. I think the OP is reading the Sandybridge section, because that's where I told him to start. Core2 and later have a 4-wide OOO pipeline, so they can rename/issue and retire 4 fused-domain uops per clock. More useful: sections 9.5 and 9.6 on page 124, `Micro-op fusion` and `Macro-op fusion`. – Peter Cordes Apr 11 '16 at 23:05
  • thanks, but, what is cache for microops? ( not buffer loop) – Gilgamesz Apr 14 '16 at 12:56