I'm currently writing a lecture on ARM optimization, specifically on vector machines such as NEON as the final target.
And since vector machines don't fare well with if-else slaloms, I'm trying to demonstrate how to get rid of them by bit-hacking.
I picked the "saturating absolute" function as an example for this. It's practically an ABS
routine with the added functionality of capping the result at 0x7fffffff.
The biggest possible negative 32bit number is 0x80000000, and it's a very dangerous thing because val = -val;
returns the same 0x80000000 as the initial value, caused by the asymmetry in the two's complement system especially for DSP operations, and thus, it has to be filtered out, mostly by "saturating".
int32_t satAbs1(int32_t val)
{
if (val < 0) val = -val;
if (val < 0) val = 0x7fffffff;
return val;
}
Below is what I would write in assembly:
cmp r0, #0
rsblts r0, r0, #0
mvnlt r0, #0x80000000
bx lr
And below is what I actually get for the C code above:
satAbs1
0x00000000: CMP r0,#0
0x00000004: RSBLT r0,r0,#0
0x00000008: BX lr
WTH? The compiler simply discarded the saturating part altogether!
The compiler seems to be ruling out val
being negative after the first if
statement which isn't true if it was 0x80000000
Or maybe the function should return an unsigned value?
uint32_t satAbs2(int32_t val)
{
uint32_t result;
if (val < 0) result = (uint32_t) -val; else result = (uint32_t) val;
if (result == 0x80000000) result = 0x7fffffff;
return result;
}
satAbs2
0x0000000C: CMP r0,#0
0x00000010: RSBLT r0,r0,#0
0x00000014: BX lr
Unfortunately, it generates the exact same machine codes as the signed version: no saturation.
Again, the compiler seems to rule out the case of val
being 0x80000000
Ok, let's widen the range of the second if statement:
uint32_t satAbs3(int32_t val)
{
uint32_t result;
if (val < 0) result = (uint32_t) -val; else result = (uint32_t) val;
if (result >= 0x80000000) result = 0x7fffffff;
return result;
}
satAbs3
0x00000018: CMP r0,#0
0x0000001C: RSBLT r0,r0,#0
0x00000020: CMP r0,#0
0x00000024: MVNLT r0,#0x80000000
0x00000028: BX lr
Finally, the compiler seems to be doing it's job, albeit sup-optimal (an unnecessary CMP
compared to the assembly version)
I can live with the compilers being sub-optimal, but what bothers me is that they are ruling out something that they shouldn't: 0x80000000
I'd even file a bug report to GCC
devs on this, but I found out that Clang
also rules out the case of the integer being 0x80000000, and thus I suppose I'm missing something regarding to the C standard.
Can anyone tell me where I'm mistaken?
Btw, below is what the if-less bit-hacking version looks like:
int32_t satAbs_bh(int32_t val)
{
int32_t temp = val ^ (val>>31);
val = temp + (val>>31);
val ^= val>>31;
return val;
}
satAbs_bh
0x0000002C: EOR r3,r0,r0,ASR #31
0x00000030: ADD r0,r3,r0,ASR #31
0x00000034: EOR r0,r0,r0,ASR #31
0x00000038: BX lr
Edit: I agree on this question of mine being a duplicate to some degree.
However, it is way more comprehensive including some assembly level stuff and bitmask technics that might be helpful compared to the referred one.
And below comes a workaround on this problem without mangling the compiler option; rule out the possibility of integer overflow preemptively:
int32_t satAbs4(int32_t val)
{
if (val == 0x80000000) return 0x7fffffff;
if (val < 0) val = -val;
return val;
}
satAbs4
0x0000002C: CMP r0,#0x80000000
0x00000030: BEQ {pc}+0x10 ; 0x40
0x00000034: CMP r0,#0
0x00000038: RSBLT r0,r0,#0
0x0000003C: BX lr
0x00000040: MVN r0,#0x80000000
0x00000044: BX lr
Again, the linaro GCC 7.4.1
I'm using demonstrates its shortcomings: I don't understand the BEQ
in line 2. moveq r0, #0x80000001
as suggested in the source code could have saved two instructions at the end.