Why is the Branch Target Buffer designed as a cache?

Question

The BHT is not a cache and it doesn't need to be because it is okay if a mistake is made when accessing it. The BTB, however, is designed as a cache because it always has to return either a hit or a miss. Why can't the BTB make a mistake?

score 4 · Answer 1 · answered Oct 30 '18 at 05:57

The BTB can make mistakes, and mis-speculation will be detected when the jump instruction actually executes (or decodes, for direct branches), resulting in a bubble as the front-end re-steers to fetch from the right location.

Or with out-of-order speculative execution for indirect branches, potentially the core has to roll back to the last known-good state just like recovering from a wrong branch direction for a conditional branch.

score 4 · Answer 2 · 2018-10-31T14:08:37.240

A BTB can present a false hit and this is currently exploited by some implementations through the use of partial tags. (Similarly, one could have one entry per set be completely untagged; in a direct-mapped BTB, no entries would be tagged. A traditional set-associative design, having (partial) tags for each way, gives "free" miss detection as part of way selection.) As Peter Cordes' answer notes, this mistake can be detected and corrected later in the pipeline.

Recognizing a BTB miss does allow the throttling of speculation. If the BTB is used to prefetch the instruction stream past an instruction cache miss, avoiding cache polluting and bandwidth wasting misspeculation can have a performance impact. When performance is limited by power or thermal considerations, avoiding misspeculation even when it would be detected and corrected quickly can save some power/heat generation and so potentially improve performance.

With a two-level BTB, a hit indication could allow the L2 BTB not to be accessed for that branch. Aside from energy efficiency, the L2 BTB may have been designed to provide lower bandwidth or to be shared with another closely coupled fetch engine (so bandwidth unused by one fetch engine could be used by another).

In addition, an indication of a BTB miss can be used to improve branch direction prediction. A miss indicates that the branch was likely not taken in recent history (whether not recently executed or not taken during recent execution); the branch direction predictor may choose to override a taken prediction (with the target calculated at decode) or may choose to treat the prediction as low confidence (e.g., using dynamic predication or giving priority to fetch from other threads). The former effectively filters out never taken branches from the predictor (which is allowed to have a destructive alias that predicts taken); both uses of a miss indication exploit the likelihood that old branch information is less likely to be accurate.

A BTB can also provide a simple method of branch identification. A BTB miss predicts that the fetch does not contain a potentially taken branch (filtering out non-branches and never taken branches). This avoids the branch direction predictor having to predict not taken for non-branch instructions (or redirecting fetch after instruction decode on a BTB false hit when the branch direction predictor predicts taken). This adds non-branches to the filtering to avoid destructive aliasing. (A separate branch identifier could be used to filter non-branch instructions and to distinguish non-conditional, indirect, and return instructions, which might use different target predictors and might not need direction prediction.)

If the BTB provides a per address direction prediction or other information used for direction prediction, a miss indication could allow the direction predictor to use other methods to provide such information (e.g., static branch prediction). A static prediction may be not particularly accurate but it is likely to be more accurate than a "random" prediction with a taken bias (since never taken branches might never enter the BTB and replacement might be based on least recently taken); a "static" predictor could also exploit the fact that there was a BTB miss. If an agree predictor is used (where a static bias is xored with the prediction to reduce destructive aliasing, biased taken branches that are taken have the same predictor updating as biased not-taken branches that are not taken), a per-address bias is needed.

An L1 BTB might also be integrated with the L1 instruction cache, particularly for branch address relative targets, such that not only is miss detection free (the tags for all ways are present) but the BTB provided target may not even be a prediction (avoiding the need to recalculate the target). This would require additional prediction resources for indirect branches (and an L2 BTB might be used to support prefetching under instruction cache misses) but can avoid significant redundant storage (as such branch instructions already store the offset).

Even though BTB miss determination is not necessary, it can be useful.

Why is the Branch Target Buffer designed as a cache?

2 Answers2