Before I begin, yes, I'm aware of the compiler built-ins __builtin_expect
and __builtin_unpredictable
(Clang). They do solve the issue to some extent, but my question is about something neither completely solves.
As a very simple example, suppose we have the following code.
void highly_contrived_example(unsigned int * numbers, unsigned int count) {
unsigned int * const end = numbers + count;
for (unsigned int * iterator = numbers; iterator != end; ++ iterator)
foo(* iterator % 2 == 0 ? 420 : 69);
}
Nothing complicated at all. Just calls foo()
with 420
whenever the current number is even, and with 69
when it isn't.
Suppose, however, that it is known ahead of time that the data is guaranteed to look a certain way. For example, if it were always random, then a conditional select (csel
(ARM), cmov
(x86), etc) possibly would be better than a branch.⁰ If it were always in highly predictable patterns (e.g. always a lengthy stream of evens/odds before a lengthy stream of the other, and so on), then a branch would be better.⁰ __builtin_expect
would not really solve the issue if the number of evens/odds were about equal, and I'm not sure whether the absence of __builtin_unpredictable
would influence branchiness (plus, it's Clang-only).
My current "solution" is to lie to the compiler and use __builtin_expect
with a high probability of whichever side, to influence the compiler to generate a branch in the predictable case (for simple cases like this, all it seems to do is change the ordering of the comparison to suit the expected probability), and __builtin_unpredictable
to influence it to not generate a branch, if possible, in the unpredictable case.¹ Either that or inline assembly. That's always fun to use.
⁰ Although I have not actually done any benchmarks, I'm aware that even using a branch may not necessarily be faster than a conditional select for the given example. The example is only for illustrative purposes, and may not actually exhibit the problem described.
¹ Modern compilers are smart. More often than not, they can determine reasonably well which approach to actually use. My question is for the niche cases in which they cannot reasonably figure that out, and in which the performance difference actually matters.