3

I (think) I understand the both fixed and floating point representations of fractional numbers in binary. However, I often see fixed point described as more accurate with less range, and floating point described as less accurate with more range. Now, if I understand correctly, floating points inaccuracies stem from the fact it cannot represent 0.1 and thus many other real numbers = I thought that fixed point had the same issue, so how is it described as more "accurate". If I'm not mistaken Von Neumann also seemed to champion this idea, and said we should not use floating point and instead fixed, but WHY?

KiwiNoob
  • 39
  • 2
  • 1
    You can't represent 0.1 exactly in fixed-point, either, and this is unrelated to floating vs. fixed. – Elliot Alderson May 30 '19 at 12:50
  • 1
    @ElliotAlderson: Neither fixed-point nor floating-point is necessarily binary-based. 0.1 can be represented in both fixed-point and floating-point formats that are decimal-based. – Eric Postpischil May 03 '21 at 13:28
  • @EricPostpischil Agreed, but the OP specifically asks about fixed and floating point representations "in binary". – Elliot Alderson May 03 '21 at 13:43

1 Answers1

3

When using 32 bits for a floating-point representation, they are commonly partitioned into 1 bit to encode the sign, 8 bits to encode the exponent, and 23 bits for the primary encoding of the significand. Due to some special treatment with the exponent, the full significand has 24 bits. In this format, the resolution of floating-point numbers is 1 part in 223, in the sense that changing the low bit changes the represented value by 2−23 times the value of the high bit.

When using 32 bits for a fixed-point representation, the high bit is typically used to indicate sign, so the highest value bit is the next highest, and it represents 230 times the value of the low bit. So, with this format, the resolution of fixed-point numbers is 1 part in 230.

Therefore, using the same number of bits, fixed-point has more resolution than floating-point. Even with other choices of how many bits to use for which parts, floating-point needs to use some bits for the exponent, and fixed-point uses zero, so fixed-point always has finer resolution than floating-point.

Floating-point offers dynamic range, meaning it can handle large or small numbers by varying the exponent as part of calculations. Fixed-point has only static range—it can represent large numbers or small numbers, but you have to choose the range when designing the code. (To some extent, this can be finessed, such as by choosing a small range for the input values but then increasing the range as values are added together to form greater and greater sums or other results. However, the scale for each particular calculation must be chosen at design time.) If you need higher resolution and do not need dynamic range, then use fixed-point. If you need dynamic range and do not need higher resolution, then used floating-point. If you need both, use more bits.

Note that the fixed scale means fixed-point will sometimes be less accurate than floating-point. Floating-point adjusts its scale to move the leading digit to the high position, up to the limits of its exponent range. So it keeps the resolution ratio around 1 in 223 in the format described above. But fixed point is stuck with the scale chosen at design time. If it needs to represent a number whose magnitude falls at, say, bit 2 of the fixed-point format, then its resolution for that number is only around 1 in 22.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312