Do most compilers transform % 2 into bit comparison? Is it really faster?

Question

In programming, one often needs to check if a number is odd or even. For that, we usually use:

n % 2 == 0

However, my understanding is that the '%' operator actually performs a division and returns its remainder; therefore, for the case above, it would be faster to simply check the last bit instead. Let's say n = 5;

5 = 00000101

In order to check if the number is odd or even, we just need to check the last bit. If it's 1, the number is odd; otherwise, it is even. In programming, it would be expressed like this:

n & 1 == 0

In my understanding this would be faster than % 2 as no division is performed. A mere bit comparison is needed.

I have 2 questions then:

1) Is the second way really faster than the first (in all cases)?

2) If the answer for 1 is yes, are compilers (in all languages) smart enough to convert % 2 into a simple bit comparison? Or do we have to explicitly use the second way if we want the best performance?

You only have to check for *one* compiler - the one you are using. Have you ever checked if `x * 9` gets translated into `lea x, [8*x + x]`? — Jongware, Aug 16 '15 at 21:08

score 18 · Accepted Answer · edited Sep 26 '18 at 14:47

Yes, a bit-test is much faster than integer division, by about a factor of 10 to 20, or even 100 for 128bit / 64bit = 64bit idiv on Intel. Esp. since x86 at least has a test instruction that sets condition flags based on the result of a bitwise AND, so you don't have to divide and then compare; the bitwise AND is the compare.

I decided to actually check the compiler output on Godbolt, and got a surprise:

It turns out that using n % 2 as a signed integer value (e.g. a return n % 2 from a function that return signed int) instead of just testing it for non-zero (if (n % 2)) sometimes produces slower code than return n & 1. This is because (-1 % 2) == -1, while (-1 & 1) == 1, so the compiler can't use a bitwise AND. Compilers still avoid integer division, though, and use some clever shift / and / add / sub sequence instead, because that's still cheaper than an integer division. (gcc and clang use different sequences.)

So if you want to return a truth value based on n % 2, your best bet is to do it with an unsigned type. This lets the compiler always optimize it to a single AND instruction. (On godbolt, you can flip to other architectures, like ARM and PowerPC, and see that the unsigned even (%) function and the int even_bit (bitwise &) function have the same asm code.)

Using a bool (which must be 0 or 1, not just any non-zero value) is another option, but the compiler will have to do extra work to return (bool) (n % 4) (or any test other than n%2). The bitwise-and version of that will be 0, 1, 2, or 3, so the compiler has to turn any non-zero value into a 1. (x86 has an efficient setcc instruction that sets a register to 0 or 1, depending on the flags, so it's still only 2 instructions instead of 1. clang/gcc use this, see aligned4_bool in the godbolt asm output.)

With any optimization level higher than -O0, gcc and clang optimize if (n%2) to what we expect. The other huge surprise is that icc 13 doesn't. I don't understand WTF icc thinks it's doing with all those branches.

"convert to unsigned first" is generally a good idea when you want bitwise things to happen — Chris Beck, Aug 16 '15 at 21:44
*"So if you want to return a truth value based on n % 2, do it with an unsigned type I guess"* Return a `bool`, and with optimizations, `n % 2` is also translated to a bit-and even with a signed `n`. (Well you almost say that in your last paragraph, but I was a bit surprised when reading the penultimate one.) — dyp, Aug 17 '15 at 21:26
@dyp: `n%2` is a special case, because it's testing the low bit. `n%4` turns into a `testb / setne`, which is 2 instructions, and can result in partial-register false dependencies or extra uops. (The only available version of `setcc` only sets the low 8 bits of the dest register, leaving the upper 54 dependent on the previous value.) — Peter Cordes, Aug 17 '15 at 21:39
I can't see any difference in the assembly generated for `aligned4_bool` between signed and unsigned, neither in gcc nor clang. Or were you alluding to the difference between bit ops and division? (in the if-clause test, there's also no difference for signed ints) — dyp, Aug 17 '15 at 21:48
@dyp: Good point, I should have included a version that returns `unsigned int`, so it can return `n & 3`. I've done so now. The return value is zero or non-zero, but not necessarily 0 or 1. It's still 2 instructions, but the `mov` would go away if this was part of a bigger function that could generate the value it wanted to test in `%eax`. — Peter Cordes, Aug 17 '15 at 21:56

score -2 · Answer 2 · answered Aug 16 '15 at 21:16

-2

The speed is equivalent.

The modulo version is generally guaranteed to work whether the integer is positive, negative or zero, regardless of the implementing language. The bitwise version is not.

Use what you feel is most readable.

answered Aug 16 '15 at 21:16

David

557
4
15

7

The speed is equivalent? The division/modulus instruction generally takes a few dozen cycles while the bitwise operation takes one. The bitwise operation does work for negative numbers if two's complement is used. Zero is also handled correctly: `(0 & 1) == 0` and zero is even. – Aug 16 '15 at 21:20
i'm aware of the overhead in the modulus operation. i'm assuming the code is compiled and optimized by said compile. do the benchmarks yourself. – David Aug 16 '15 at 21:21
4

That is a different statement than what you wrote. I know it's optimized. It is optimized because the modulo operation is not more efficient. You also contradict yourself: If the bitwise operation is not correct for negative numbers (which might actually be correct in C and C++, I don't know how they define bitwise operations for negative numbers) then surely the compiler can't optimize that? – Aug 16 '15 at 21:22
well, you do have a point - I did answer without the standard fluff. and sure it can – David Aug 16 '15 at 21:26
1

If there is a difference in the result, the compiler is not allowed to make that change. Unless explicitly allowed (and this is a rarity), optimizations must not change the observable behavior of programs. Of course, if there is no difference, it can optimized, but it can't both make a difference and be done by the optimiizer. – Aug 16 '15 at 21:28
Afaik the bitwise version is only incorrect for negative zero, a representation that was only possible on very old and now obsolete computers. I can't think of any modern computer that can represent negative zero on a integer primitive type. – Havenard Aug 16 '15 at 21:28

Do most compilers transform % 2 into bit comparison? Is it really faster?

2 Answers2