4

I wanted to confirm that the modulo operation was an expensive operation so I tested this piece of code that checks if a given number is even:

bool is_even(int n) {
    return (n & 1) == 0;
}

then this one:

bool is_even_bis(int n) {
    return (n % 2) == 0;
}

I used C# at first and indeed, the code using logical & is faster than the other one, sometimes even three times faster. Using ILSpy I saw that there was no optimization done when compiled to MSIL, the code is strictly the same.

However as spotted by a friend of mine in C, using gcc -O3 the code is compiled to:

is_even:
    mov     eax, DWORD PTR [esp+4]  # tmp63, n
    and     eax, 1  # tmp63,
    xor     eax, 1  # tmp63,
    ret

and:

is_even_bis:
    mov     eax, DWORD PTR [esp+4]  # tmp63, n
    and     eax, 1  # tmp63,
    xor     eax, 1  # tmp63,
    ret

So basically strictly the same thing. Even when using -O0 optimization the operation doesn't even appear:

is_even:
    push    ebp     #
    mov     ebp, esp        #,
    mov     eax, DWORD PTR [ebp+8]  # tmp63, n
    and     eax, 1  # D.1837,
    test    eax, eax        # D.1837
    sete    al      #, D.1838
    movzx   eax, al # D.1836, D.1838
    pop     ebp     #
    ret

Needlessly to say the compiled code is the same between is_even and is_even_bis in -O0 as well.

Even more funny if I may say, another friend of mine tried the same using OCaml:

let is_even x = ((x land 1) == 0)

let _ = 
  let i = ref 100000000 in
  while !i > 0 do
    ignore (is_even !i);
    decr i
  done

and:

let is_even_bis x = ((x mod 2) == 0)

let _ = 
  let i = ref 100000000 in
  while !i > 0 do
    ignore (is_even_bis !i);
    decr i
  done

And it appears that the modulo version is faster when running the bytecode but slower in native code! Maybe someone can explain this mystery?

Then I started wondering why it does not behave like that in C# (where there is an obvious gap of performance between the two functions) and why the JIT compiler does not apply the same optimization as gcc. I don't know if there's a way to intercept the output of the JIT compiler, maybe that would help to understand?

Bonus question : I guess the modulo is based on division and since the division is done in O(n²) time (n being the number of digits) can we say that the modulo has quadratic time complexity?

H H
  • 263,252
  • 30
  • 330
  • 514
Max
  • 3,453
  • 3
  • 32
  • 50

1 Answers1

2

Firstly, there is no concept of speed for these operations, in a portable sense. Your assertions might be true for your system, but they're invalid for all systems. For this reason, it's quite pointless speculating on micro-optimisations. You can find far more significant optimisations by producing a program that solves a meaningful problem, profiling it to find the parts of the code that take up the most execution time and introducing faster algorithms for those times. By faster algorithms, I mean better data structures (or less operations), as opposed to different operators. Stop focusing on micro-optimisations!

Your C version of is_even isn't well-defined. It might produce negative zeros or trap representations, particularly for negative numbers. Using a trap representation is undefined behaviour.

It seems as though the difference you might be seeing could be caused by signed integer representation on your system. Consider if -1 were to be represented using ones complement 11111111...11111110. You'd expect -1 % 2 to result in -1, not 0, wouldn't you? (edit: ... but what would you expect -1 & 1 to result in, if -1 is represented as 11111111...11111110?) There needs to be some overhead, to handle this for implementations that use ones complement as signed integer representation.

Perhaps your C compiler has noticed that the % expression you used and the & expression you used are equivalent on your system, and as a result made that optimisation, but the optimisation hasn't been performed by the C# or OCaml compilers for whatever reason.

Bonus question : I guess the modulo is based on division and since the division is done in O(n²) time (n being the number of digits) can we say that the modulo has quadratic time complexity?

There is no point contemplating the time complexity of these two basic operations, because they'll differ from system to system. I covered that in my first paragraph.

autistic
  • 1
  • 3
  • 35
  • 80
  • "Your C version of `is_even` isn't well-defined. It might produce negative zeros or trap representations," No. It's wrong for negative values on ones' complement machines, but that is all. – Daniel Fischer Apr 25 '13 at 12:28
  • @DanielFischer "If the implementation does not support negative zeros, the behavior of the &, |, ^, ~, <<, and >> operators with operands that would produce such a value is undefined." – autistic Apr 25 '13 at 13:14
  • 1
    Yes, but `x & 1` would not produce a negative zero. `x & 1` can only produce a (signless/nonnegative) zero or a 1. – Daniel Fischer Apr 25 '13 at 13:19
  • @DanielFischer Does the standard state anywhere that the `&` operator operates on the sign bit? – autistic Apr 25 '13 at 13:21
  • 1
    "The result of the binary & operator is the bitwise AND of the operands (that is, each bit in the result is set if and only if each of the corresponding bits in the converted operands is set)." Doesn't say "value bits"; I guess how it handles padding bits is up to the implementation, and contrary to arithmetic operations, it's not explicitly guaranteed that a bitwise operation cannot generate a trap representation (or I haven't found the guarantee yet). But if everything except arithmetic operations were allowed to generate trap representations willy-nilly, that'd be terrible. – Daniel Fischer Apr 25 '13 at 13:27
  • @DanielFischer What do you think of [securecoding.cert.org](http://www.securecoding.cert.org/)? – autistic Apr 25 '13 at 13:49
  • Don't really know, haven't looked much at it. Lots of good [but mostly reiterating common sense; unfortunately common sense is very uncommon, so reiterating it isn't redundant] advice, intermingled with the occasional bit of cargo cult (Do not declare more than one variable per declaration). [Yes, I know where that comes from, but that simplistic rule is just taking it too far.] I think I remember that they had some downright stupid things there, but hopefully those have been removed. – Daniel Fischer Apr 25 '13 at 14:07
  • @DanielFischer I tend to agree, but when it comes to minuscule micro-optimisations that could be risky, sites like that can be fairly persuasive. I don't agree so much with that "Don't use goto" sort of dribble, but I can understand "Do not declare more than one variable per declaration" if `#define` is used in place of `typedef` (which would be even dribblier, of course... Thanks, Microsoft!). – autistic Apr 25 '13 at 14:34
  • Okay, if there is a possibility that you declare a `#define type other_type*`, yes (but be sure to kill the original coder). But for `int sum = 0, minimum = INT_MAX, maximum = INT_MIN;` (or a couple of loop counters), meh. – Daniel Fischer Apr 25 '13 at 14:39
  • "You'd expect `-1 % 2` to result in 1, not 0, wouldn't you?" No, I would expect -1. – newacct Apr 25 '13 at 23:10
  • @newacct Indeed! What was I thinking? – autistic Apr 26 '13 at 01:20