Yes, that's likely. But subtle differences between languages can result in different asm from similar-looking source. It's rare that the front-end will give the back-end exactly the same inputs. It may end up optimized the same for simple functions, and will generally use the same kinds of strategies for things. (e.g. on x86 how many LEA instructions it's worth using instead of a multiply.)
e.g. in C, signed overflow is undefined behaviour, so
void foo(int *p, int n) {
for (int i = 0; i <= n ; i++) {
p[i] = i/4;
}
}
can be assumed to terminate eventually for all possible n
(including INT_MAX
), and for i
to be non-negative.
With a front-end for a language where i++
is defined to have 2's complement wrap-around (or gcc with -fwrapv -fno-strict-overflow
), i
would go from ==INT_MAX
to a large negative, always <= INT_MAX
. The compiler would be required to make asm that faithfully implements the source code's behaviour even for callers that pass n == INT_MAX
, making this an infinite loop where i
can be negative.
But since that's Undefined Behaviour in C and C++, the compiler can assume the program doesn't contain any UB, and thus that no caller can pass INT_MAX
. It can assume that i
is never negative inside the loop, and that the loop trip-count fits in a int
. See also What Every C Programmer Should Know About Undefined Behavior (clang blog).
The non-negative assumption lets it implement i / 4
with a simple right-shift, rather than implementing C integer division semantics for negative numbers.
# the p[i] = i/4; part of the inner loop from
# gcc -O3 -fno-tree-vectorize
mov edx, eax # copy the loop counter
sar edx, 2 # i / 4 == i>>2
mov DWORD PTR [rdi+rax*4], edx # store into the array
Source + asm output on the Godbolt compiler explorer.
But if signed wrap-around is defined behaviour, signed division by a constant takes more instructions, and array indexing has to account for the possible wrapping:
# Again *just* the body of the inner loop, without the loop overhead
# gcc -fno-strict-overflow -fwrapv -O3 -fno-tree-vectorize
test eax, eax # set flags (including SF) according to i
lea edx, [rax+3] # edx = i+3
movsx rcx, eax # sign-extend for use in the addressing mode
cmovns edx, eax # copy if !signbit_set(i)
sar edx, 2 # i/4 = i>=0 ? i>>2 : (i+3)>>2;
mov DWORD PTR [rdi+rcx*4], edx
C array-indexing syntax is just sugar for pointer + integer, and doesn't require that the index is non-negative. So it's valid for the caller to pass a pointer to the middle of a 4GB array which this function must eventually write. (Infinite loops are questionable, too, but NVM that.)
As you can see, a tiny difference in language rules required the compiler to not optimize. Differences between language rules are usually larger than the difference between ISO C++ and the defined-signed-wraparound flavour of C++ that g++ can implement.
Also, if the "usual" types are different widths or signedness in another language, it's very likely that the back-end will get different input, and in some cases that will matter.
If I had used unsigned
, wraparound would be the defined overflow behaviour in C and C++. But unsigned
types by definition are non-negative, so the possibility of wraparound wouldn't have such an obvious effect on optimizations without unrolling. If the loop had started from greater than zero, then wraparound introduces the possibility of coming back to 0
, in case that matters (e.g. x / i
is a division by zero).