when ... given 0x800000, my function returns 0 ...., the correct answer ... is 0x00400000.
This is dividing the minimum normal float
value by 2 and is detailed in #3 below.
There are many issues with the code.
For most finite numbers, decrementing rather than shifting the exponent is correct as pointed out by @John Bollinger good answer when the exponent is > 1.
When the exponent == 0
, the number is sub-normal (or denormal) and needs to have its mantissa
field shifted right (/2
). The exponent remains 0. If the bit shifted out is 1, then the divided-by-2 it not exact. Depending on rounding more, then, mantissa
is adjusted - perhaps by adding 1.
When the exponent == 1
, the result will be sub-normal and the implied bit of normal numbers needs to be created in the mantissa
field and shifted right (/2
). This shift may incur a rounding as discussed above. The exponent becomes 0. Note that "rounding" mant
may exceed mant
max value of 0x7FFFFF
and then require adjustments to the fields.
When the exponent == MAX (255)
, the the number is not finite (it is infinity or Not-a-Number) and should be left alone.
Code like 1 << 31
is better defined as:
// unsigned signBit = (1 << 31) & uf;
unsigned signBit = (1u << 31) & uf; // Use an unsigned mask
unsigned signBit = (1LU << 31) & uf; // unsigned may be 16 bit.
// or better yet
unsigned signBit = uf & 0x80000000;
Corner weaknesses with the mantissa
derivation in that it relies on the (overwhelmingly common) 2's complement. Portable alternative:
// unsigned mantissa = ~0; Incorrect mask in `mantissa` when `int` is not 2's comp.
// unsigned mantissa = -1; correct all bits set.
// mantissa >>= 9;
// mantissa &= uf;
// or simply use
unsigned mantissa = 0x7FFFFF & uf;
unsigned
may be 16, 32, 64, bit etc. Better to use minimum or exact width types.
#define SIGN_MASK 0x80000000
#define EXPO_MASK 0x7F800000
#define MANT_MASK 0x007FFFFF
#define EXPO_SHIFT 23
#define EXPO_MAX (EXPO_MASK >> EXPO_SHIFT)
#define MANT_IMPLIED_BIT (MANT_MASK + 1u)
uint32_t divideFloatBy2(uint32_t uf){
unsigned sign = uf & SIGN_MASK;
unsigned expo = uf & EXPO_MASK;
unsigned mant = uf & MANT_MASK;
expo >>= EXPO_SHIFT;
// when the number is not an infinity nor NaN
if (expo != EXPO_MAX) {
if (expo > 1) {
expo--; // this is the usual case
} else {
if (expo == 1) {
mant |= MANT_IMPLIED_BIT;
}
expo = 0;
unsigned round_bit = mant & 1;
mant /= 2;
if (round_bit) {
TBD_CODE_Handle_Rounding(round_mode, sign, &expo, &mant);
}
}
expo <<= EXPO_SHIFT;
uf = sign | expo | mant;
}
return uf;
}
OP later commented exponent ,sign 0, mantissa == 0x3, expected result is 0x2, but my returning 1. so rounding mode is likely FE_TONEAREST
or possibly FE_UPWARD
.
Re-write of the case when expo <= 1
follows. It is tested code - going through many of the 232 combinations and with 4 rounding modes.
Note that when some_float/2.0f
computes, it may affect the floating-point environment status bits. I have initially done like-wise but since eliminated that code from this post - contact if interested.
} else {
if (expo == 1) {
expo = 0;
mant |= MANT_IMPLIED_BIT;
}
// Divided by 2 result inexact?
if (mant % 2) {
mant /= 2;
// Determine how to round
switch (fegetround()) {
case FE_DOWNWARD:
if (sign) mant++;
break;
case FE_TOWARDZERO:
break;
case FE_UPWARD:
if (!sign) mant++;
break;
default: // When mode is not known, act like FE_TONEAREST
// fall through
case FE_TONEAREST:
if (mant & 1) mant++;
break;
}
if (mant >= MANT_IMPLIED_BIT) {
mant = 0;
expo++;
}
} else {
mant /= 2;
}
}
For details on the rounding modes, search on the FE_...
macros or here.