How tricky is floating point in storing a value in memory

Question

Say,I have to store 2147483648 as a float(not as a fixed-point number like integer) in a 32-bit system. For this what will be the mantissa (significand) and exponent ? And how this number is represented in memory?

=> http://en.wikipedia.org/wiki/Floating_point + http://en.wikipedia.org/wiki/IEEE_floating_point — benjarobin, Nov 18 '13 at 14:38
This question belongs to `IEEE 754` floating point representation. Not a programming question. — haccks, Nov 18 '13 at 14:38
Actually I wanted to know how floating point representation supports wider range of numbers in comparison to its fixed-point counterpart(int). I have read already these wiki pages,but couldn't figure it out. The mantissa or significand is only 24-bit long in 32-bit machine.And the number (2147483647) itself is 31 bit if it is a fixed-point representation otherwise. I am confused as how larger numbers are supported in float. @benjarobin — Parveez Ahmed, Nov 18 '13 at 14:53
Actually I wanted to know how floating point representation supports wider range of numbers in comparison to its fixed-point counterpart(int). I have read already these wiki pages,but couldn't figure it out. The mantissa or significand is only 24-bit long in 32-bit machine.And the number (2147483647) itself is 31 bit if it is a fixed-point representation otherwise. I am confused as how larger numbers are supported in float. @LuiggiMendoza — Parveez Ahmed, Nov 18 '13 at 14:55
Actually I wanted to know how floating point representation supports wider range of numbers in comparison to its fixed-point counterpart(int). I have read already these wiki pages,but couldn't figure it out. The mantissa or significand is only 24-bit long in 32-bit machine.And the number (2147483647) itself is 31 bit if it is a fixed-point representation otherwise. I am confused as how larger numbers are supported in float. @CyrilleKa — Parveez Ahmed, Nov 18 '13 at 14:56
Again, what have you tried, what's the result of your search, if you did any? If not, then this question is more like *I'm curious about this, what do you think?* — Luiggi Mendoza, Nov 18 '13 at 14:56
Your tags are like dots on the entire alphabet, save the _i_ and _j_... C, java, javascript whereas you're actually asking about _floating point_ and _IEEE754_... — Elias Van Ootegem, Nov 18 '13 at 15:03

cHao · Accepted Answer · 2013-11-20T19:45:57.680

2

Floating-point numbers are typically represented by a packed combination of a "significand", which is either 0 or a binary fraction in the range [1, 2); an exponent; and a sign bit. (See the comments above about "IEEE 754"; that's the standard that spells out the most common floating-point representations. It's quite google'able.)

2147483648 will sort-of fit in a typical (single-precision) float , because the most common form uses a binary exponent, meaning the number is represented as (significand)*2^exponent. Since your number is a power of two, it can be represented exactly in single precision as 1.0*2³¹.

However, since the significand (mantissa) is not actually 32 bits in size (it's 24, IIRC), it can't store all the significant bits of an arbitrary integer that large. That means that neither 2147483647 nor 2147483649 will fit. They will have their low bits rounded off, and will have the same representation (and thus, the same value) as 2147483648.

Use a double instead, if you care about non-power-of-two values that big. The significand is large enough to safely represent integers up to 53 bits in size.

edited Nov 20 '13 at 19:45

answered Nov 18 '13 at 14:44

cHao

84,970
20
145
172

1

2147483648 fits exactly in an IEEE-754 32-bit binary floating-point object, not “sort-of”. The significant bits are the ones that carry meaning contributing to the value of the number. The fact that 2147483648 is represented exactly by an IEEE-754 32-bit binary floating-point object demonstrates that it contains all the bits needed to represent the value. Trailing zeros are not significant bits in floating-point representations. – Eric Postpischil Nov 18 '13 at 15:04
2147483648 fits entirely due to its being a power of two. The value itself is representable, but is barely useful -- you couldn't add or subtract 1 and get a correct result, for example. Hence, "sort-of" fits. – cHao Nov 18 '13 at 15:07
@cHao if it is 2147483647 as you said, then what will be the representation,as it is an odd number ? – Parveez Ahmed Nov 18 '13 at 15:19
@ParveezAhmed: 2147483647 and 2147483649 couldn't exist in a float; the low 8 bits would be lost, so you'd end up with 2147483648. In a double, on the other hand, it'd be something like 1.0000000000000000000000000000001 (base 2) * 2^31. – cHao Nov 18 '13 at 15:23
@cHao As we know that floating point is used to support wider range of numbers than fixed-point numbers. 2147483647 can be stored comfortably as a fixed-point in a 32-bit processor whereas it can't as a float in the same machine! How do we justify that floating point supports wider range? Confusion! – Parveez Ahmed Nov 19 '13 at 02:52
@cHao I read somewhere in stackoverflow that javascript uses floating point,as it can successfully print a date beyond 2038 january 19 which would be something greater than 2147483647,so I tested: which alerts the number exactly. Then if it is true that JS uses floating point,then why for double? I am confused! – – Parveez Ahmed Nov 19 '13 at 02:57
1

@ParveezAhmed: JS uses *double-precision* (64-bit) floating point for all numbers, which is the same as C/Java/etc's `double` type. So it can safely hold a 32-bit integer (and in fact, integers quite a bit larger as well...up to 53 bits, or about 16 decimal digits). – cHao Nov 19 '13 at 16:01
@rosemary: Any sequence of 32 bits only has 2^32 possible arrangements, and thus can only represent up to 2^32 distinct values. 32-bit floats are bound by that; altogether, they can only represent 2^32 distinct numbers. (Actually, even less than that; some values represent NaNs, infinities, etc.) They work around that limitation by trading 8 significant bits for an exponent, which gives them the ability to scale. The 23+1 remaining bits can only represent ~16 million distinct values for any given exponent, though, and as the exponent increases, so does the gap between representable values. – cHao Nov 19 '13 at 18:40
For example, the distinct float closest to (but less than) 1.0*2^31 would be represented as 1.FFFFFE (base 16) * 2^30, which is equal to 2147483520. That's a gap of 128. Any number between 1.FFFFFE*2^30 and 1*2^31 would lose significant bits anyway if it were shoehorned into a float, so it would end up rounded to one of those two values. In a double, on the other hand, since it has 52+1 significant bits, the nearest representable value is 1.FFFFFFFFFFFFF(base 16)*2^30, or 2147483647.9999997615814208984375 ...leaving a gap of less than 250 billionths. – cHao Nov 19 '13 at 19:24
@chao I find it difficult to accept friend. For example, a floating point number can be as large as 934584883609.6 that is 3.4*2^38 – Parveez Ahmed Nov 20 '13 at 06:35
1

@rosemary: You mean 3.4*10^38. But maybe the tradeoff will make more sense in base 10. Say i have 3 decimal digits, which alone only form integers between 000 and 999. If i use scientific notation, and declare one digit as the exponent, the range increases dramatically; i can represent 9.9E+9 (9.9 billion) with three digits `999`. Problem is, *now i can't precisely represent 999*. (I specified a rule: one digit becomes the exponent. If i arbitrarily go back on that, `999` becomes uselessly ambiguous; does it mean 999, or 9.9 billion?) So i have to round, either up to 1000 or down to 990. – cHao Nov 20 '13 at 19:14

haccks · Answer 2 · 2013-11-18T15:02:22.237

To represent this number in IEEE-754 (in single precision); first you need to convert it to binary equivalent. And then into the form

( (-1)^sign ) * (1 + fraction)*2^(exponent-bias)

Single precision bias = 127.

 +----+-------------+------------------------------------+
 | 1  |      8      |                 23                 |
 |bit |     bit     |                bit                 |
 +-+--+------+------+-----------------+------------------+
   |         |                        |
   |         |                        |
   |         |                        |
   |         |                        |
   v         v                        v
 sign     Exponent                 Fraction
 bit

most welcome friend @haccks a friend in need is a friend indeed — Parveez Ahmed, Nov 18 '13 at 15:15

How tricky is floating point in storing a value in memory

2 Answers2