2

I've read that one of the penalties for integer overflow checking is pollution of the branch history table.

I was wondering if it is really necessary. Assuming the CPU statically predicts a forward branch as not taken and the branch is indeed not taken. Can't the CPU leave it out of the branch history table? This way the branch history table won't be polluted and the branch will be predicted correctly next time anyway.

Does anyone know if this is actually done by some CPUs? And if not is there a reason why its a bad idea?

Ilya Lesokhin
  • 359
  • 2
  • 7
  • How could the CPU predict a branch in the first place if there was no branch history table ? – Pac0 Sep 29 '17 at 16:30
  • I mean, the point is, if it starts to be wrong more often than predicted, then mayb it's because it should change its prediction. But if it doesn't remember that it was wrong more times than it was right (if your proposal is implemented), it can't take the new good choice! – Pac0 Sep 29 '17 at 16:32
  • I'm assuming that the CPU's static predication (behavior for branches no in the branch history table is) is backward taken, forward not taken. Assuming it was indeed not taken you don't insert it to the table. If it is taken, the CPU inserts it into the table and update it on every hit. – Ilya Lesokhin Sep 29 '17 at 16:36
  • @Pac0 and Ilya: I think this is actually correct for CPUs that use static prediction. Until static prediction is wrong once, you don't even track its history. The first time it's taken, evict another entry and start tracking. You won't have the history of all the not-taken cases, but the predictor will pretty soon figure out a strongly not-taken pattern if that's still true. (This of course doesn't work for taken backward branches; the prediction is needed well ahead of decode to avoid bubbles, so even unconditional branches need to be predicted.) – Peter Cordes Sep 29 '17 at 17:00
  • 1
    But not all CPUs work the way you're imagining. Supposedly, modern Intel CPUs just map every branch address to a BTB entry and use it, regardless of aliasing. If two branches alias each other, then the prediction will be dominated by whichever one runs more often. But [maybe there is static prediction in Intel CPUs after all, according to Matt Godbolt's research](https://xania.org/201602/bpu-part-one). I might write this up as an answer in a while. – Peter Cordes Sep 29 '17 at 17:02
  • Can you elaborate on what you mean by "map every branch address to a BTB entry"? do you map every address to a BTB entry, even if there is no branch in it? and consequently sometimes predict branches even if there is no branch in the code? Or are you doing the predication during the decode stage? – Ilya Lesokhin Sep 29 '17 at 18:36
  • 1
    @Peter to be fair, Matt's numbers seem to show that Arrandale (Nehalem uarch) may use static prediction, but it seems to support the no-static-prediction conventional wisdom for IB and Haswell. – BeeOnRope Sep 29 '17 at 19:50
  • @Ilya - I was just wondering the same thing. How does the predictor even know what IPs correspond to branches? It certainly isn't doing a prediction for every instruction, including non-branches. So perhaps the BTB-with-confirmation is used: you look up the IP in the BTB and check that your "hit" is valid (that is, that your BTB entry is tagged with your IP). With a TAGE style direction predictor, however, you can't reasonably do the same confirmation for the branch direction guess. – BeeOnRope Sep 29 '17 at 19:54
  • Ilya (and @bee): Hmm, good point. The let-everything-alias design makes sense for the branch *history* buffer (predicting the *direction* of conditional branches only). The BTB proper is probably separate. Note that fetch can predict at the granularity of fetch blocks (i.e. given a block, which block do I fetch next). But later in the pipe, a finer-grained prediction is needed. Maybe never-taken conditional branches end up not using a BTB entry as long as their BHB keeps predicting not-taken? I'd like to see a solid answer to this question. (maybe update it with some of this) – Peter Cordes Sep 29 '17 at 19:57
  • 1
    @PeterCordes - yup the BTB is definitely separate, not least because what each "holds" is very different. The BTB holds a target address, probably requiring 30+ bits, while the direction table needs only a single (not) taken bit. The way they are indexed is totally different too: the direction table "wants" to be indexed by a combination of the IP, and some global and/or local branch history. The branch target however is fixed, so it "only" wants to be indexed by the IP, and probably looks more like a traditional cache. I'm not considering here indirect branches, which work like a hybrid. – BeeOnRope Sep 29 '17 at 20:18

0 Answers0