How many bits out of 64 is assigned to integer part and fractional part in double. Or is there any rule to specify it?
-
2[What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf). See also [this answer](http://programmers.stackexchange.com/questions/215065/can-anyone-explain-representation-of-float-in-memory/215126#215126). – John Bode May 14 '15 at 13:10
-
1Floating point does not have an integer and fractional part as such. It is like scientific notation. The normal numbers in the commonest double format have an 11 bit binary exponent, modifying a significand of the form 1.x, where x is 52 bits. – Patricia Shanahan May 14 '15 at 14:18
2 Answers
Note: I know I already replied with a comment. This is for my own benefit as much as the OPs; I always learn something new when I try to explain it.
Floating-point values (regardless of precision) are represented as follows:
sign * significand * βexp
where sign
is 1 or -1, β
is the base, exp
is an integer exponent, and significand
is a fraction. In this case, β
is 2
. For example, the real value 3.0
can be represented as 1.102 * 21
, or 0.112 * 22
, or even 0.0112 * 23
.
Remember that a binary number is a sum of powers of 2, with powers decreasing from the left. For example, 1012
is equivalent to 1 * 22 + 0 * 21 + 1 * 20
, which gives us the value 5
. You can extend that past the radix point by using negative powers of 2, so 101.112
is equivalent to
1 * 22 + 0 * 21 + 1 * 20 + 1 * 2-1 + 1 * 2-2
which gives us the decimal value 5.75
. A floating-point number is normalized such that there's a single non-zero digit prior to the radix point, so instead of writing 5.75
as 101.112
, we'd write it as 1.01112 * 22
How is this encoded in a 32-bit or 64-bit binary format? The exact format depends on the platform; most modern platforms use the IEEE-754 specification (which also specifies the algorithms for floating-point arithmetic, as well as special values as infinity and Not A Number (NaN)), however some older platforms may use their own proprietary format (such as VAX G and H extended-precision floats). I think x86 also has a proprietary 80-bit format for intermediate calculations.
The general layout looks something like the following:
seeeeeeee...ffffffff....
where s
represents the sign bit, e
represents bits devoted to the exponent, and f
represents bits devoted to the significand or fraction. The IEEE-754 32-bit single-precision layout is
seeeeeeeefffffffffffffffffffffff
This gives us an 8-bit exponent (which can represent the values -126
through 127
) and a 22-bit significand (giving us roughly 6 to 7 significant decimal digits). A 0
in the sign bit represents a positive value, 1
represents negative. The exponent is encoded such that 000000012
represents -126
, 011111112
represents 0
, and 111111102
represents 127
(000000002
is reserved for representing 0
and "denormalized" numbers, while 111111112
is reserved for representing infinity and NaN). This format also assumes a hidden leading fraction bit that's always set to 1
. Thus, our value 5.75
, which we represent as 1.01112 * 22
, would be encoded in a 32-bit single-precision float as
01000000101110000000000000000000
|| || |
|| |+----------+----------+
|| | |
|+--+---+ +------------ significand (1.0111, hidden leading bit)
| |
| +---------------------------- exponent (2)
+-------------------------------- sign (0, positive)
The IEEE-754 double-precision float uses 11 bits for the exponent (-1022
through 1023
) and 52 bits for the significand. I'm not going to bother writing that out (this post is turning into a novel as it is).
Floating-point numbers have a greater range than integers because of the exponent; the exponent 127
only takes 8 bits to encode, but 2127
represents a 38-digit decimal number. The more bits in the exponent, the greater the range of values that can be represented. The precision (the number of significant digits) is determined by the number of bits in the significand. The more bits in the significand, the more significant digits you can represent.
Most real values cannot be represented exactly as a floating-point number; you cannot squeeze an infinite number of values into a finite number of bits. Thus, there are gaps between representable floating point values, and most values will be approximations. To illustrate the problem, let's look at an 8-bit "quarter-precision" format:
seeeefff
This gives us an exponent between -7
and 8
(we're not going to worry about special values like infinity and NaN) and a 3-bit significand with a hidden leading bit. The larger our exponent gets, the wider the gap between representable values gets. Here's a table showing the issue. The left column is the significand; each additional column shows the values we can represent for the given exponent:
sig -1 0 1 2 3 4 5
--- ---- ----- ----- ----- ----- ----- ----
000 0.5 1 2 4 8 16 32
001 0.5625 1.125 2.25 4.5 9 18 36
010 0.625 1.25 2.5 5 10 20 40
011 0.6875 1.375 2.75 5.5 11 22 44
100 0.75 1.5 3 6 12 24 48
101 0.8125 1.625 3.25 6.5 13 26 52
110 0.875 1.75 3.5 7 14 28 56
111 0.9375 1.875 3.75 7.5 15 30 60
Note that as we move towards larger values, the gap between representable values gets larger. We can represent 8 values between 0.5
and 1.0
, with a gap of 0.0625
between each. We can represent 8 values between 1.0
and 2.0
, with a gap of 0.125
between each. We can represent 8 values between 2.0
and 4.0
, with a gap of 0.25
in between each. And so on. Note that we can represent all the positive integers up to 16
, but we cannot represent the value 17
in this format; we simply don't have enough bits in the significand to do so. If we add the values 8
and 9
in this format, we'll get 16
as a result, which is a rounding error. If that result is used in any other computation, that rounding error will be compounded.
Note that some values cannot be represented exactly no matter how many bits you have in the significand. Just like 1/3
gives us the non-terminating decimal fraction 0.333333...
, 1/10
gives us the non-terminating binary fraction 1.10011001100...
. We would need an infinite number of bits in the significand to represent that value.

- 119,563
- 19
- 122
- 198
a double on a 64 bit machine, has one sign bit, 11 exponent bits and 52 fractional bits.
think (1 sign bit) * (52 fractional bits) ^ (11 exponent bits)

- 850
- 5
- 15
-
I have been going through this [link](http://stackoverflow.com/questions/1848700/biggest-integer-that-can-be-stored-in-a-double) I'm still not able to understand why the max value of double is 1.7E308 but taking 53 bit for integer part it's only amounts to 2^53. How are these 2 number related? – Austin Philip D Silva May 14 '15 at 12:35