Why did Intel remove the 16-byte branch target alignment Coding Rule from the Optimization Reference Manual?

Question

Previous versions of the Intel® 64 and IA-32 Architectures Optimization Reference Manual have contained this Coding Rule:

Assembly/Compiler Coding Rule 12. (M impact, H generality)
All branch targets should be 16-byte aligned.

The May 2020 version does not have this rule. Why was it removed?

Alignment to 16 bytes comes with a cost as discussed here on the Linux Kernel Mailing List. But it is a long standing rule and ARM has the same rule for its microarchitectures.

Consider aligning subroutine entry points and branch targets to quadword (16 byte) boundaries.

AMD's Software Optimization Guide for AMD Family 17h Processors says:

Having 16 byte aligned branch targets gets maximum picker throughput and avoids end-of-cacheline short op cache (OC) entries.

I'd guess that the uop cache makes front-end issues usually not a big deal, and code-density is more important. 32-byte boundaries are what matters for the uop cache (with up to 3x 6-uop "lines"), and that would be too much padding *inside* functions, especially for non-loop branch targets. — Peter Cordes, Oct 10 '20 at 00:36
I think that as with branch prediction (Always-not-taken, Backwards Taken/Forwards Not-Taken, BTB, ...) where Intel basically taken it off the table that branch target alignment is now off the table. Programmers should not try to out guess Intel transistors. It's pretty much the same with prefetch LLVM unfortunately hasn't taken the hint and still generates branch hints with no switch to turn them off. Branch alignment no longer being necessary is new so it's understandable that LLVM hasn't added a switch for that yet. — Olsonist, Oct 10 '20 at 16:53

Why did Intel remove the 16-byte branch target alignment Coding Rule from the Optimization Reference Manual?

0 Answers0