3

Consider the following code:

// Preprocessor
#include <iostream>
#include <vector>

// Internal branching
void f1(std::vector<int>& v, const int x = 0)
{
    for (unsigned int i = 1; i < v.size(); ++i) {
        v[i] = (x != 0) ? (v[i-1]*x) : (v[i-1]+v[i-1]);
    }
}

// External branching
void f2(std::vector<int>& v, const int x = 0)
{
    if (x != 0) {
        for (unsigned int i = 1; i < v.size(); ++i) {
            v[i] = v[i-1]*x;
        }
    } else {
        for (unsigned int i = 1; i < v.size(); ++i) {
            v[i] = v[i-1]+v[i-1];
        }
    }
}

// Main
int main()
{
    std::vector<int> v(10, 2);
    f1(v);
    f2(v);
    return 0;
}

It illustrates the behaviour of two functions that produce the same result:

  • f1: the condition is tested inside of the loop
  • f2: the condition is tested outside of the loop

The branching is based on x, which is declared as const.

My question is: are compilers sufficiently intelligent to transform f1 in f2 when all optimization levels are turned on?

Vincent
  • 57,703
  • 61
  • 205
  • 388
  • 1
    For this code, I'd expect the compiler to completely eliminate the check, and just build the code for `x == 0`. You can certainly come up with scenarios where the compiler CAN'T do that, but in this case, certainly. [And of course, if you really care in your particular case, then benchmark it, with your compiler and your real code!] – Mats Petersson Aug 13 '13 at 21:54

2 Answers2

3

The best way to check if your compiler is hoisting the conditional out the loop is to really inspect the assembly after compiling it with full optimizations.

After building your example with:

g++ -O3 -c example.cpp -o example.o
objdump -d -M intel example.o > example.S

Here's what I got for f1:

00000020 <f1(std::vector<int, std::allocator<int> >&, int)>:
  ; ...
  23:   8b 54 24 10             mov    edx,DWORD PTR [esp+0x10]
  27:   8b 7c 24 14             mov    edi,DWORD PTR [esp+0x14]
  2b:   8b 02                   mov    eax,DWORD PTR [edx]
  2d:   8b 4a 04                mov    ecx,DWORD PTR [edx+0x4]
  30:   29 c1                   sub    ecx,eax
  32:   c1 f9 02                sar    ecx,0x2
  35:   83 f9 01                cmp    ecx,0x1
  38:   76 d                    jbe    57 <f1(std::vector<int, std::allocator<int> >&, int)+0x37>
  3a:   31 db                   xor    ebx,ebx
  3c:   85 ff                   test   edi,edi
  3e:   ba 01 00 00 00          mov    edx,0x1
  43:   75 b                    jne    60 <f1(std::vector<int, std::allocator<int> >&, int)+0x40>
  45:   8b 34 18                mov    esi,DWORD PTR [eax+ebx*1]
  48:   83 c3 04                add    ebx,0x4
  4b:   01 f6                   add    esi,esi
  4d:   89 34 90                mov    DWORD PTR [eax+edx*4],esi
  50:   83 c2 01                add    edx,0x1
  53:   39 d1                   cmp    ecx,edx
  55:   75 ee                   jne    45 <f1(std::vector<int, std::allocator<int> >&, int)+0x25>
  57:   5b                      pop    ebx
  58:   5e                      pop    esi
  59:   5f                      pop    edi
  5a:   c3                      ret    
  5b:   90                      nop
  5c:   8d 74 26 00             lea    esi,[esi+eiz*1+0x0]
  60:   8b 34 18                mov    esi,DWORD PTR [eax+ebx*1]
  63:   83 c3 04                add    ebx,0x4
  66:   0f af f7                imul   esi,edi
  69:   89 34 90                mov    DWORD PTR [eax+edx*4],esi
  6c:   83 c2 01                add    edx,0x1
  6f:   39 ca                   cmp    edx,ecx
  71:   75 ed                   jne    60 <f1(std::vector<int, std::allocator<int> >&, int)+0x40>
  73:   eb e2                   jmp    57 <f1(std::vector<int, std::allocator<int> >&, int)+0x37>

On line 3c, you find the conditional being checked:

  ; if(x != 0)
  3c:   85 ff                   test   edi,edi
  43:   75 b                    jne    60 ; ...

From that point on after the check, x is never tested again and the loops are just executed for each part. The first loop starting from line 45 to 55 is done when x == 0. The second loop starting from line 60 to 71 is done when x != 0.

So yes, at least in this case, gcc is able to hoist the conditional out of the loop with full optimizations enable.

greatwolf
  • 20,287
  • 13
  • 71
  • 105
0

Godbolt page tells me following things:

#6,#7 Exit immediately if the loop won't ever be entered (size() <= 1).

#8,#9 Check if x is not 0, if it's true, then one loop will be chosen (.L3 + .L6), otherwise second loop will be taken (.L5).

As you can see two optimizations were performed including the one you were querying about.

Red XIII
  • 5,771
  • 4
  • 28
  • 29