From the BTB's point of view, both versions are the same. In both versions (if compiled unoptimized) there is only one conditional jump (each originating from the i<LOOPS
), so there is only one jump target in the code, thus only one branch target buffer is used. You can see the resulting assembler code using Matt Godbolt's compiler explorer.
There would be difference between
for(int i=0;i<n;i++){
if(i%2==0)
do_something();
}
and
for(int i=0;i<n;i++){
if(i%2==0)
do_something();
if(i%3==0)
do_something_different();
}
The first version would need 2 branch target buffers (for for
and for if
), the second would need 3 branch target buffers (for for
and for two if
s).
However, how Matt Godbolt found out, there are 4096 branch target buffers, so I would not worry too much about them.