11

I have always wondered where/how the prediction data is stored? Is there a limit? Is it only recent branches? I am mostly concerned about Intel architecture, but anything I can learn about any architecture is appreciated.

johnnycrash
  • 5,184
  • 5
  • 34
  • 58

2 Answers2

8

Somewhere internally in the processor. What exactly is done depends on the processor.

In a very simple case, you might take 4096 bits of branch prediction data. Then for every branch, you take the last 12 bits of the address of the branch, which gives 4096 different values, and take that as the index into your branch prediction data. And since you have only one bit of data, you just store whether the last branch was taken.

The advantage is that it is very cheap. The disadvantage is that two branches exactly 4096 bytes apart use the same entry in the table. So if your code executes these two branches all the time, and one is always taken and one is never taken, the branch prediction is quite bad.

Some processor use two bits per branch meaning "strong taken", "taken", "not taken", "strong not taken". Every time a branch is taken the prediction moves towards "strong taken", if the branch is not taken it moves towards "strong not taken". This works better if branches are usually taken with rare exceptions.

Some processors don't just use the last 12 or more bits of the branch address, but they mix in whether say the last four branches were taken. Say you have code

if (x >= 0) { ... } 
if (x <= 0) { ... }

and x is rarely 0, but quite randomly positive or negative. Then the first branch is hard to predict, but the second is never taken after the first one is taken, and always taken if the first one is not taken. By mixing in this information, you use up two entries in the branch prediction table for the second branch, but the prediction for the second branch will be highly accurate, even though the branch is randomly taken or not taken.

You always have the problem that the same entry in the branch prediction table will be used for more than one branch; you just live with that. (Doing anything clever to handle this would take much too much storage. We are using 1 or 2 bit per branch prediction so we can have massive tables with very little storage).

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • I was wondering why they don't stick it in a few unused bits in the opcode in the code itself. Guess that would be real slow since it would have to go back to ram. – johnnycrash Sep 03 '14 at 17:44
  • @johnnycrash, stick what? the branch resolution is not static per a single branch location in the program (a unique IP / PC), best example is a loop branch - it's taken only until the point it's not taken any more. Some branch predictors would hold multiple entries for these branches in varying history contexts. If on the other hand you mean to change these bits during runtime - look up Self-Modifying-Code and think again. – Leeor Sep 10 '14 at 19:24
  • @Leeor Since this would only be a hint, there would not be a self-modifying code problem. (In addition, the front-end would be doing the modification.) There would be a problem if multiple programs/threads were running the code, in which case an Icache refill may make a prediction from another thread's history. This would also mean more dirty cache blocks to writeback. –  Oct 20 '14 at 21:58
4

Metadata of branch predictors is stored on-chip, in branch-predictor tables. Some research works propose storing them in the cache hierarchy (which is called predictor virtualization) but I don't think it has been implemented in any real processor, yet.

Since you expressed willingness to know more, see my survey paper for more details on architectures of several branch predictors.

user984260
  • 3,401
  • 5
  • 25
  • 38