while playing around with godbolt.org I noticed that gcc (6.2, 7.0 snapshot), clang (3.9) and icc (17) when compiling something close to
int a(int* a, int* b) {
if (b - a < 2) return *a = ~*a;
// register intensive code here e.g. sorting network
}
compiles (-O2/-O3) this into somthing like this:
push r15
mov rax, rcx
push r14
sub rax, rdx
push r13
push r12
push rbp
push rbx
sub rsp, 184
mov QWORD PTR [rsp], rdx
cmp rax, 7
jg .L95
not DWORD PTR [rdx]
.L162:
add rsp, 184
pop rbx
pop rbp
pop r12
pop r13
pop r14
pop r15
ret
which obviously has a huge overhead in case of b - a < 2. In case of -Os gcc compiles to:
mov rax, rcx
sub rax, rdx
cmp rax, 7
jg .L74
not DWORD PTR [rdx]
ret
.L74:
Which leads me to beleave that there is no code keeping the compiler from emitting this shorter code.
Is there a reason why compilers do this ? Is there a way to get them compiling to the shorter version without compiling for size?
Here's an example on Godbolt that reproduces this. It seems to have something to do with the complex part being recursive