Consider below loop (https://godbolt.org/z/z4Wz1aanK) that has no loop-carried dependence. Will modern CPU speculatively execute next iteration with previous one? if true, is loop expansion still necessary here?
void bar(void)
{
for (int i = 0; i < 1024; i++)
out[i] = foo(src[i]);
}
The result of compilation:
bar():
pushq %rbx
xorl %ebx, %ebx
.L2:
movl src(%rbx), %edi
addq $4, %rbx
call foo(int)
movl %eax, out-4(%rbx)
cmpq $4096, %rbx
jne .L2
popq %rbx
ret
src:
.zero 400
out:
.zero 400
Update1: Now I am sure speculative execution can cross loop iterations. The question is how far that can be, considering dependency chain introduced by loop count i
?