16

Let's say I've got two integer values stored in double variables, e. g.:

double x = 100.0;
double y = 7.0;

May I safely assume that any arithmetic operation on these two double variables that would yield an integer result, will return an exact integer value (as a double)? That is, will for example all of:

x + y = 107.0
x - y = 93.0
x * y = 700.0

return the exact integer values, or will be there some accuracy problems? Like x*y being 699.99995 or so?

The general question: Is it true that any arithmetic operation on two double variables holding integer values that would yield an integer result will return the exact integer value (as a double)?

I'm asking this in a Java context, but I assume it's similar in other languages, too.

MicSim
  • 26,265
  • 16
  • 90
  • 133

7 Answers7

13

As long as the integer result of your operation can be exactly represented as a double, you will get an exact result, but as soon as the integer result exceeds the number of bits available in the mantissa (i.e. 52+1 = 53 bits), it will be rounded.

MicSim
  • 26,265
  • 16
  • 90
  • 133
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
4

In general, the answer is No. However, I strongly recommend reading David Goldberg’s "What Every Computer Scientist Should Know About Floating-Point Arithmetic" - it never hurts to know the things from the inside.

kervich
  • 487
  • 3
  • 13
3

Not if the resulting number has too many digits to fit in a double. For example, 1234567890.0 * 1234567890.0 yields 1,52415787501905E+18 rather than 1524157875019052100. I don't know whether it will always be precise if the result fits, but @Sven Marnach answered that. I assume that the truncated number will be off by an exact integer, as @Douglas Leeder says, because the mantissa shifted by the exponent (which is greater than the number of digits in the mantissa) will become an integer.

Aasmund Eldhuset
  • 37,289
  • 4
  • 68
  • 81
3

Excellent discussion, all.

Your question is

Is it true that any arithmetic operation on two double variables holding integer values that would yield an integer result will return the exact integer value (as a double)?

I chose a borderline case, where two numbers were exactly 53 bits long. The 54-bit sum exceeded the capacity of a double, and it did not return an exact integer result. As expected, the low-order bit was truncated, and you have a strange, but expected result.

An odd number plus an even number does not yield an odd sum (as mathematics would tell you); Java reports an even number (as the IEEE standard would tell you).

Try this sample:

private static void doubleCalc() {
  double x = 4503599627370497.0d; // binary 10000000000000000000000000000000000000000000000000001
  double y = 4503599627370496.0d; // binary 10000000000000000000000000000000000000000000000000000

  double sum = x + y;
  System.out.println("sum=" + sum + "; should be 9007199254740993.0d");
}

It will print out:

sum=9.007199254740992E15; should be 9007199254740993.0d

So this carefully chosen counterexample would answer "no" to your carefully worded question.

rajah9
  • 11,645
  • 5
  • 44
  • 57
2

All int values can be represented by double values exactly, and the +, *, - operations work the same here (as long as you don't exceed the int range). The / and % operators work different, though.

As double has only 52 bits of mantissa, you can't represent all long values exactly, too.

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
1

As long as the numbers aren't too far apart (like 2^1024 and 0.005) the results should be exact. Double precision floating point numbers work like this: 1 bit for sign, 11 for exponent and 52 bits for the mantissa. The final number is ( (-1)*(sign) )(1.mantissa << (exponent - 1 << 10) ) so when the addition is made between 2 numbers, this is what happens:

x = number with greatest exponent
y = number with smallest exponent

(in case of same sign)
z.mantissa = x.mantissa + (y.mantissa >> (x.exponent - y.exponent) )
sign = either_one.sign

(in case of opposite sign)
z.mantissa = x.mantissa - (y.mantissa >> (x.exponent - y.exponent) )
sign = x.sign

for multiplication/division it's a bit simpler:

z.exponent = x.exponent + y.exponent
z.mantissa = 1.(x.mantissa) (operand) (y.mantissa)
z.sign = x.sign != y.sign
while (z.mantissa is not in format 1.x)
   z.mantissa << 1 (division)
   z.exponent--
   z.mantissa >> 1 (multiplication)
   z.exponent++

So what happens is if the exponents are too far apart there will be a loss of data when the shift occurs, meaning the precision for double (floating point in general) are not 100% precise (especially since some numbers turn into periodical decimates). For perfect integer numbers (and results) however, it should be alright as long as the number is up to 52 bits long (size of the mantissa) since it can be shifted into an integer by the cpu (like 1.111 << 3 is 1111).

Jean-Luc Nacif Coelho
  • 1,006
  • 3
  • 14
  • 30
0

In a related question it was pointed out to me that a double have about 15-digits of precision while it can hold up to 10^(300+) large numbers. So I guess as long as you are using smaller int it shouldn't be a big problem.

That being said here's a bit from the oracle tutorials:

double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in section 4.2.3 of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.

For further reference, here's a link to the the section 4.2.3 mentioned above.

Community
  • 1
  • 1
posdef
  • 6,498
  • 11
  • 46
  • 94