I need to convert float to Q31 fixed-point, Q31 meaning 1 sign bit, 0 bits for integer part, and 31 bits for fractional part. This means that Q31 can only represent numbers in the range [-1,0.9999]
.
By definition, when converting from float to fixed-point, a multiplication by 2ˇN
is done, where N is the fractional part size, in this case 31.
However, I got confused with this code, it doesn't look right, but works:
#define q31_float_to_int(x) ( (int) ( (float)(x)*(float)0x7FFFFFFF ) )
And it seems to work OK. For example:
int a = q31_float_to_int(0.5f);
gives Hex: 0x40000000
, which is OK.
Why is the multipication here done with 2ˇ31 - 1
, and not just 2ˇ31
?