1

For a floating-point format with an n-bit fraction, give a formula for the smallest positive integer that cannot be represented exactly (because it would require an (n + 1)-bit fraction to be exact). Assume the exponent field size k is large enough that the range of representable exponents does not provide a limitation for this problem.

The solution given by the book is 2^(n + 1) + 1, but it doesn't provide any explanations. Could some explain how we derive this formula? Thank you.

Charles Z.
  • 11
  • 1
  • 3
  • 2
    Hint: Consider a number written in binary. If the distance from its first 1 bit to its last 1 bit is n−1 bits or less, then those two bits and all the bits between them fit in n bits. They may be followed by zeros, but the floating-point format can add zeros after a number by increasing the exponent. What is the smallest integer for which the non-zero bits do not fit in n bits? If you have trouble answering that, try it with n=3. – Eric Postpischil Jul 04 '21 at 23:37

1 Answers1

4

Consider a 32-bit floating-point IEEE representation which has a 23-bit precision fraction.

The 24-bit integer 1111111111111111111111112 = 224 - 1 can be represented exactly, because there are enough bits (even if the most significant one is implicit).

Adding 1, we have 10000000000000000000000002 = 224. No problem with that, even if it's 25-bit number, because it's a power of two.

The next one, 10000000000000000000000012 = 224 + 1, though, can't be exactly represented because there aren't enough bits in fraction part of the representation.

Bob__
  • 12,361
  • 3
  • 28
  • 42