Yes, a bit-test is much faster than integer division, by about a factor of 10 to 20, or even 100 for 128bit / 64bit = 64bit idiv on Intel. Esp. since x86 at least has a test
instruction that sets condition flags based on the result of a bitwise AND, so you don't have to divide and then compare; the bitwise AND
is the compare.
I decided to actually check the compiler output on Godbolt, and got a surprise:
It turns out that using n % 2
as a signed integer value (e.g. a return n % 2
from a function that return signed int
) instead of just testing it for non-zero (if (n % 2)
) sometimes produces slower code than return n & 1
. This is because (-1 % 2) == -1
, while (-1 & 1) == 1
, so the compiler can't use a bitwise AND. Compilers still avoid integer division, though, and use some clever shift / and / add / sub sequence instead, because that's still cheaper than an integer division. (gcc and clang use different sequences.)
So if you want to return a truth value based on n % 2
, your best bet is to do it with an unsigned type. This lets the compiler always optimize it to a single AND instruction. (On godbolt, you can flip to other architectures, like ARM and PowerPC, and see that the unsigned even
(%
) function and the int even_bit
(bitwise &
) function have the same asm code.)
Using a bool
(which must be 0 or 1, not just any non-zero value) is another option, but the compiler will have to do extra work to return (bool) (n % 4)
(or any test other than n%2
). The bitwise-and version of that will be 0, 1, 2, or 3, so the compiler has to turn any non-zero value into a 1. (x86 has an efficient setcc
instruction that sets a register to 0 or 1, depending on the flags, so it's still only 2 instructions instead of 1. clang/gcc use this, see aligned4_bool
in the godbolt asm output.)
With any optimization level higher than -O0
, gcc and clang optimize if (n%2)
to what we expect. The other huge surprise is that icc 13 doesn't. I don't understand WTF icc thinks it's doing with all those branches.