Unlike x86 where the same compiler can produce x86-32 or x86-64 code with -m32
and -m64
, you need a separate build of gcc for ARM vs. AArch64.
ARM gcc accepts -march=armv8-a
, but it's still compiling in 32-bit ARM mode, not AArch64.
I can reproduce your problem on the Godbolt compiler explorer with AArch64 gcc and ARM gcc. (And I included an example that uses __builtin_clz(uiShift)
instead of inline asm, so it compiles to a clz
instruction on either architecture.)
BTW, you could have left out the w
size override on both operands, and simply use unsigned int
for the input and output. Then the same inline asm would work with both ARM and AArch64. (But __builtin_clz
is still better, because the compiler understands what it does. e.g. it knows the result is in the range 0..31, which may enable some optimizations.)