Short version: how can I detect overflow using the fixed-point multiplication described here but for a signed type?
Long version:
I still have some overflow issues with my Q31.32 fixed point type. To make it easier to work out examples on paper, I've made a much smaller type using the same algorithm, a Q3.4 based on sbyte. I figure that if I can work out all the kinks for a Q3.4 type, the same logic should apply for a Q31.32 one.
Note that I could very easily implement Q3.4 multiplication by performing it on a 16-bit integer, but I'm doing as if that didn't exist, because for the Q31.32 I'd need a 128-bit integer which doesn't exist (and BigInteger is too slow).
I want my multiplication to handle overflow by saturation, that is when overflow happens, the result is the highest or smallest value that can be represented depending on the sign of the operands.
This is basically how the type is represented:
struct Fix8 {
sbyte m_rawValue;
public static readonly Fix8 One = new Fix8(1 << 4);
public static readonly Fix8 MinValue = new Fix8(sbyte.MinValue);
public static readonly Fix8 MaxValue = new Fix8(sbyte.MaxValue);
Fix8(sbyte value) {
m_rawValue = value;
}
public static explicit operator decimal(Fix8 value) {
return (decimal)value.m_rawValue / One.m_rawValue;
}
public static explicit operator Fix8(decimal value) {
var nearestExact = Math.Round(value * 16m) * 0.0625m;
return new Fix8((sbyte)(nearestExact * One.m_rawValue));
}
}
And this is how I currently handle multiplication:
public static Fix8 operator *(Fix8 x, Fix8 y) {
sbyte xl = x.m_rawValue;
sbyte yl = y.m_rawValue;
// split x and y into their highest and lowest 4 bits
byte xlo = (byte)(xl & 0x0F);
sbyte xhi = (sbyte)(xl >> 4);
byte ylo = (byte)(yl & 0x0F);
sbyte yhi = (sbyte)(yl >> 4);
// perform cross-multiplications
byte lolo = (byte)(xlo * ylo);
sbyte lohi = (sbyte)((sbyte)xlo * yhi);
sbyte hilo = (sbyte)(xhi * (sbyte)ylo);
sbyte hihi = (sbyte)(xhi * yhi);
// shift results as appropriate
byte loResult = (byte)(lolo >> 4);
sbyte midResult1 = lohi;
sbyte midResult2 = hilo;
sbyte hiResult = (sbyte)(hihi << 4);
// add everything
sbyte sum = (sbyte)((sbyte)loResult + midResult1 + midResult2 + hiResult);
// if the top 4 bits of hihi (unused in the result) are neither all 0s or 1s,
// then this means the result overflowed.
sbyte topCarry = (sbyte)(hihi >> 4);
bool opSignsEqual = ((xl ^ yl) & sbyte.MinValue) == 0;
if (topCarry != 0 && topCarry != -1) {
return opSignsEqual ? MaxValue : MinValue;
}
// if signs of operands are equal and sign of result is negative,
// then multiplication overflowed upwards
// the reverse is also true
if (opSignsEqual) {
if (sum < 0) {
return MaxValue;
}
}
else {
if (sum > 0) {
return MinValue;
}
}
return new Fix8(sum);
}
This gives result accurate within the precision of the type and handles most overflow cases. It doesn't however handle these ones, for example:
Failed -8 * 2 : expected -8 but got 0
Failed 3.5 * 5 : expected 7,9375 but got 1,5
Let's work out how the multiplication happens for the first one.
-8 and 2 are represented as x = 0x80 and y = 0x20.
xlo = 0x80 & 0x0F = 0x00
xhi = 0x80 >> 4 = 0xf8
ylo = 0x20 & 0x0F = 0x00
yhi = 0x20 >> 4 = 0x02
lolo = xlo * ylo = 0x00
lohi = xlo * yhi = 0x00
hilo = xhi * ylo = 0x00
hihi = xhi * yhi = 0xf0
The sum is obviously 0 as all terms are 0 save for hihi, but only the lowest 4 bits of hihi are used in the final sum.
My usual overflow detection magic doesn't work here: the result is zero so the sign of the result is meaningless (e.g. 0.0625 * -0.0625 == 0 (by rounding down), 0 is positive yet signs of operands differ); also the high bits of hihi are 1111 which often happens even when there's no overflow.
Basically I don't know how to detect that overflow happened here. Is there a more general method?