CSAPP 3ed Practice Problem 2.49 IEEE Floating-Point Precision

Question

For a ﬂoating-point format with an n-bit fraction, give a formula for the smallest positive integer that cannot be represented exactly (because it would require an (n + 1)-bit fraction to be exact). Assume the exponent ﬁeld size k is large enough that the range of representable exponents does not provide a limitation for this problem.

The solution given by the book is 2^(n + 1) + 1, but it doesn't provide any explanations. Could some explain how we derive this formula? Thank you.

Hint: Consider a number written in binary. If the distance from its first 1 bit to its last 1 bit is n−1 bits or less, then those two bits and all the bits between them fit in n bits. They may be followed by zeros, but the floating-point format can add zeros after a number by increasing the exponent. What is the smallest integer for which the non-zero bits do not fit in n bits? If you have trouble answering that, try it with n=3. — Eric Postpischil, Jul 04 '21 at 23:37

Bob__ · Accepted Answer · 2021-07-04T23:35:20.880

Consider a 32-bit floating-point IEEE representation which has a 23-bit precision fraction.

The 24-bit integer 111111111111111111111111₂ = 2²⁴ - 1 can be represented exactly, because there are enough bits (even if the most significant one is implicit).

Adding 1, we have 1000000000000000000000000₂ = 2²⁴. No problem with that, even if it's 25-bit number, because it's a power of two.

The next one, 1000000000000000000000001₂ = 2²⁴ + 1, though, can't be exactly represented because there aren't enough bits in fraction part of the representation.

CSAPP 3ed Practice Problem 2.49 IEEE Floating-Point Precision

1 Answers1