Fixed-point instead of floating point

Question

How many bits does fixed-point number need to be at least as precise as floating point number? If I wanted to carry calculations in fixed-point arithmetic instead of floating-point, how many bits would I need for the calculations to be not less precise?

Single precision (32-bits) float can represent numbers as small as 2^-126 and as large as 2^127, does it mean the fixed point number has to be at least in 128.128 format? (128 bits for integer part, 128 bits for fractional part).

I understand that single precision floats can represent only range of ~7 decimal digits at a time, I'm asking about all possible values.

And what about double precision (64-bits floats), does it really take 1024.1024 format to be equally precise?

Note that an 128.128 float would actually be more precise than an IEEE-754 float, because the latter has gaps (because of the use of a limited mantisssa combined with an exponent). Floating point is more or less an exponential format, with a fixed size mantissa/significand and larger or smaller values than the range [1.0, 2,0) are made by multiplying with 2^exponent (and a sign) -- note that I did not discuss denormals or NaNs or infinities. The fixed point format would not have any gaps. That makes sense anyway, because a float has 32 bits, while a 128.128 fixed point number would have 256. — Rudy Velthuis, Jun 28 '17 at 10:27
But do you really need all these values? Look at the range of values you need for your particular application and decide how many bits you need. I think you could save a few bits. — Rudy Velthuis, Jun 28 '17 at 10:29
@RudyVelthuis "Note that an 128.128 float would actually be more precise" - for almost all values, true. "you really need all these values?" probably not, but i still am curios how many bits are needed to achieve similar precision to floats, especially for the edge cases. — Ecir Hana, Jun 28 '17 at 22:36
It should not be too hard to find out: the smallest denormal is 1 (lowest) bit in the mantissa and the minimum exponent (-126, IIRC, but don't pin me down on that). — Rudy Velthuis, Jun 29 '17 at 12:38

score 0 · Accepted Answer · answered Jul 01 '17 at 18:49

For single precision, you would need to store bits with values in the range [2^-149, 2¹²⁸) which would require a signed 128.149 fixed-point type, totaling a width of 278 bits.

For double precision, you would need to store bits with values in the range [2^-1074, 2¹⁰²⁴) which would require a signed 1024.1074 fixed-point type, totaling a width of 2099 bits.

(Disclaimer: This all assumes I've made an even number of off-by-one errors.)

Fixed-point instead of floating point

1 Answers1