Input for branch predictor unit?

Question

I am looking at slide 13 here:

http://research.engineering.wustl.edu/~songtian/pdf/intel-haswell.pdf

(It should show a large block diagram for Haswell)

At the top it has a block called "Branch Predictors", with two arrows coming out. I am a little unsure what is the correct ordering of the processes here? For a start, the "Branch predictors" block doesn't have any input?!

Could somebody try and explain (using the diagram) how the Branch predictor interfaces with the other elements?

In addition, which elements from the diagram would the Branch Target (Predictor) Buffer interface with? Would it be the same two (orange and purple) arrowed away from the Branch Predictor?

It has 1.4 Billion transistors, i'm pretty sure that the diagram doesn't show the entire connectivity. It's reasonable to assume that the Branch predictor would get its inputs from the execution unit (for correct branch resolution updates and addresses for indirect branches), and the decode unit for direct addresses. — Leeor, Feb 09 '14 at 07:07
Thank you for your second sentence. How would the Branch Target Buffer fit in with the diagram? Which units would it interact with? — user997112, Feb 09 '14 at 19:24
Whoever made the presentation did not give proper attribution to the image sources (some are recognizably from IDF presentations). The one discussed is actually from Figure 5 in David Kanter's ["Intel's Haswell CPU Microarchitecture"](http://www.realworldtech.com/haswell-cpu/‎) ([page 6](http://www.realworldtech.com/haswell-cpu/6/) has Figure 5). — , Feb 09 '14 at 21:37

score 3 · Answer 1 · answered Feb 09 '14 at 22:38

Intel is not especially forthcoming on details of its branch predictor. Quoting Agner Fog's The microarchitecture of Intel, AMD and VIA CPUs (2013-09-04 edition): "The branch predictor appears to have been redesigned in the Haswell, but very little is known about its construction."

It is most likely that either a global history string (e.g., one bit indicating taken/not-taken for the last N branches) or possibly a path history (similar to a global history string but typically using a hashing of instruction addresses) is used to address one or more branch prediction tables, likely with the instruction address. This is probably something vaguely similar to but more sophisticated than a gshare predictor. (One might consider this history as part of the branch predictor rather than as an input.)

The instruction address is also likely used to index a branch target buffer (likely with another table for indirect calls and jumps which would likely use some global history information). The instruction address is also likely to be used to predict that nature of any branches (branch identification), so that appropriate target predictors are used. Branch identification is particularly important for cases using a specialized predictor (such as function return targets).

For any misprediction of branch type, target, or branch direction, the correct information derived later in the pipeline is communicated to the predictor. (It may be helpful to also confirm correct predictions.) For ordinary branches and jumps, the target can be calculated in the front-end (before branch condition evaluation) to correct target mispredictions for taken cases. Similarly, branch misidentification can be fixed after instruction decode. On a misprediction of branch direction or target for indirect control flow the correct information can be provided from later in the pipeline.

I think Intel disclosed the existence of a return stack buffer (RSB) - see http://www.realworldtech.com/nehalem/4/ . The rest is indeed vague, but there is an interesting (even if a little old) review here - http://www.ece.uah.edu/~milenka/docs/milenkovic_WDDD02.pdf — Leeor, Feb 10 '14 at 10:03
@Leeor The referenced work by Agner Fog is also a good source for x86 branch predictor information (and provides some general information about branch prediction). Note: this answer is community wiki, so you can make any improvements that seem appropriate. — , Feb 10 '14 at 17:54
My tests indicate that it is probably using a path history hash or something similar. Basically if you have a nested loop like: `outer: mov rax, rcx; inner: dec rax, jnz inner; nop ...; dec rbx, jnz outer`, you find that prediction for the inner loop exit varies from 100% to 0% successful depending on the number of nops between the inner and outer jumps. The global history should be identical in both cases, and I don't think the BTB can explain it (since it should have at least two ways), so it seems the address of the jump is used in the history hash (to me). — BeeOnRope, Mar 22 '17 at 20:43

Input for branch predictor unit?

1 Answers1