1

Im trying to convert a large integer to a 32 bit single precision float but I can't get past this problem I'm having. What if the binary representation of the big integer is larger than the 23 bit mantissa.

For example, take the integer 1,671,277,287

Its binary representation is 01100011100111011010101011100111

My understanding is that you move the decimal place left until there is only 1 bit to the left of it like so:

01.100011100111011010101011100111

My problem is that this mantissa is 30 bits and a single precision float only has room for 23 bits of mantissa. I have tried looking for this specific problem but I haven't found anything. How would I tackle this?

EDIT: Found out some information just in case someone else has this problem. Default rounding for single precision float is "round to nearest, ties to even". Another StackOverflow post comments on how to do this easily.

StackOverflow post regarding rounding

Community
  • 1
  • 1
Alex Anderson
  • 11
  • 1
  • 3
  • 1
    You're not missing anything - a 32 bit single precision float has fewer significant digits of precision (6 - 7 decimal digits) than a 32 bit integer, due to the fact that it only has a 23 bit mantissa. So those least significant bits in your example will jut "drop off" the end. – Paul R Sep 02 '15 at 21:39
  • 1
    Note that, assuming IEEE754, you have 24 bits thanks to the implied leading 1. But that still does not make 30 bits, so the way to do it is to round appropriately - that is w.r.t. IEEE754 rounding mode in effect, default being round to nearest, tie to even. – aka.nice Sep 02 '15 at 21:49
  • @PaulR So the 30 bit mantissa "100011100111011010101011100111" is just "10001110011101101010101" with the rightmost 7 bits left off? – Alex Anderson Sep 02 '15 at 21:51
  • 1
    @AlexAnderson: yes, pretty much, although typically you would apply rounding rather than just truncating the mantissa, as noted in the comment above. – Paul R Sep 03 '15 at 05:50

1 Answers1

1

This question amounts to being how to round 1000_1110_0111_0110_1010_1011_1001_11 to 23 bits. I am going to assume the usual default rounding mode, which is round to nearest with round to even as tie breaker.

The most significant 23 bits are 1000_1110_0111_0110_1010_101. The most significant dropped bit is 1, and there are lower significance non-zero bits.

The general rule is:

  • If the first dropped bit is 0 round down.
  • If the first dropped bit is 1 and the lower significance bits are all zero, round to even.
  • If the first dropped bit is one and there are lower significance non-zero bits, round up.

The third of these rules applies here, so you should round up to 1000_1110_0111_0110_1010_110

Patricia Shanahan
  • 25,849
  • 4
  • 38
  • 75