1

How many bits does fixed-point number need to be at least as precise as floating point number? If I wanted to carry calculations in fixed-point arithmetic instead of floating-point, how many bits would I need for the calculations to be not less precise?

Single precision (32-bits) float can represent numbers as small as 2^-126 and as large as 2^127, does it mean the fixed point number has to be at least in 128.128 format? (128 bits for integer part, 128 bits for fractional part).

I understand that single precision floats can represent only range of ~7 decimal digits at a time, I'm asking about all possible values.

And what about double precision (64-bits floats), does it really take 1024.1024 format to be equally precise?

Ecir Hana
  • 10,864
  • 13
  • 67
  • 117
  • Note that an 128.128 float would actually be more precise than an IEEE-754 float, because the latter has gaps (because of the use of a limited mantisssa combined with an exponent). Floating point is more or less an exponential format, with a fixed size mantissa/significand and larger or smaller values than the range [1.0, 2,0) are made by multiplying with 2^exponent (and a sign) -- note that I did not discuss denormals or NaNs or infinities. The fixed point format would not have any gaps. That makes sense anyway, because a float has 32 bits, while a 128.128 fixed point number would have 256. – Rudy Velthuis Jun 28 '17 at 10:27
  • 1
    But do you really need all these values? Look at the range of values you need for your particular application and decide how many bits you need. I think you could save a few bits. – Rudy Velthuis Jun 28 '17 at 10:29
  • @RudyVelthuis "Note that an 128.128 float would actually be more precise" - for almost all values, true. "you really need all these values?" probably not, but i still am curios how many bits are needed to achieve similar precision to floats, especially for the edge cases. – Ecir Hana Jun 28 '17 at 22:36
  • It should not be too hard to find out: the smallest denormal is 1 (lowest) bit in the mantissa and the minimum exponent (-126, IIRC, but don't pin me down on that). – Rudy Velthuis Jun 29 '17 at 12:38

1 Answers1

0

For single precision, you would need to store bits with values in the range [2-149, 2128) which would require a signed 128.149 fixed-point type, totaling a width of 278 bits.

For double precision, you would need to store bits with values in the range [2-1074, 21024) which would require a signed 1024.1074 fixed-point type, totaling a width of 2099 bits.

(Disclaimer: This all assumes I've made an even number of off-by-one errors.)

John McFarlane
  • 5,528
  • 4
  • 34
  • 38