Multiplication of floating point numbers gives different results in Numpy and R

Question

I am doing data analysis in Python (Numpy) and R. My data is a vector 795067 X 3 and computing the mean, median, standard deviation, and IQR on this data yields different results depending on whether I use Numpy or R. I crosschecked the values and it looks like R gives the "correct" value.

Median: 
Numpy:14.948499999999999
R: 14.9632

Mean: 
Numpy: 13.097945407088607
R: 13.10936

Standard Deviation: 
Numpy: 7.3927612774052083
R: 7.390328

IQR: 
Numpy:12.358700000000002
R: 12.3468

Max and min of the data are the same on both platforms. I ran a quick test to better understand what is going on here.

Multiplying 1.2*1.2 in Numpy gives 1.4 (same with R).
Multiplying 1.22*1.22 gives 1.4884 in Numpy and the same with R.
However, multiplying 1.222*1.222 in Numpy gives 1.4932839999999998 which is clearly wrong! Doing the multiplication in R gives the correct answer of 1.49324.
Multiplying 1.2222*1.2222 in Numpy gives 1.4937728399999999 and 1.493773 in R. Once more, R is correct.

In Numpy, the numbers are float64 datatype and they are double in R. What is going on here? Why are Numpy and R giving different results? I know R uses IEEE754 double-precision but I don't know what precision Numpy uses. How can I change Numpy to give me the "correct" answer?

It would help to show your code so we could address your actual problem. It is also important to distinguish between how floats are being *printed* versus the actual floating point *value*. For instance, in R, `sprintf("%.20f", 1.222*1.222)` prints `"1.49328399999999983372"` which identically matches what you get in Python with `'{:.20f}'.format(1.222*1.222)`. The floating point value is the same, but when you enter `1.222*1.222` at the R prompt, R prints `1.493284` while Python prints `1.4932839999999998` — unutbu, Apr 15 '16 at 01:34
You might also try changing the `dtype` of your NumPy data to `float128`: `data = data.astype(np.float128)`. This might help, though it's just a shot in the dark without seeing both your Python and R code. — unutbu, Apr 15 '16 at 01:38
@unutbu: R uses 64-bit floats, so sticking with 64-bit floats in Python is reasonable here. — John Zwinck, Apr 15 '16 at 01:40
Try reducing your data set to a smaller set that still shows a discrepancy. Post your code and if possible, the reduced data set (you can't paste it here if it's large, so share it elsewhere). — John Zwinck, Apr 15 '16 at 01:42
I believe unutbu is correct here. Some programming languages will make their output numbers nice, while underneath the true number is a bit different. Take for example `0.1+0.1` The answer should be `0.2` and that's what most languages will tell you, but if you twist their arm and force them to print the number in its full glory, you'll usually get something like `0.2000000000000000111022302`. This is not because the language is wrong, but rather the inherent limits of 64 bit calculations. — zephyr, Apr 15 '16 at 02:20
_"Multiplying 1.2*1.2 in Numpy gives 1.4"_ - That's not how multiplication works! — Eric, Apr 15 '16 at 03:58
1.2222*1.2222 = 1.49377284, so numpy is within 10^-20, which is pretty good, given that there are no natural constants or measureable physical quantities known to that relative accuracy out there. The value given for R is simply rounded. Both R and numpy are fine, you are just using a rounded representation for R. As @unutbu noted. The statistical quantities are more serious, is there a possible 1/n vs 1/(n-1) definition difference (at least for standard deviation vs sample stdev)? — roadrunner66, Apr 15 '16 at 04:27
Actually the example suggests that R ostensibly uses _singe-precision_ floats, so I'm a bit confused why you are complaining about python. R is not printing double-precision values, even though it uses double-precision in calculations. — Jan Christoph Terasa, Apr 15 '16 at 05:20

Jan Christoph Terasa · Accepted Answer · 2016-04-15T05:49:29.813

Python

The print statement/function in Python will print single-precision floats. Calculations will actually be done in the precision specified. Python/numpy uses double-precision float by default (at least on my 64-bit machine):

import numpy

single = numpy.float32(1.222) * numpy.float32(1.222)
double = numpy.float64(1.222) * numpy.float64(1.222)
pyfloat = 1.222 * 1.222

print single, double, pyfloat
# 1.49328 1.493284 1.493284

print "%.16f, %.16f, %.16f"%(single, double, pyfloat)
# 1.4932839870452881, 1.4932839999999998, 1.4932839999999998

In an interactive Python/iPython shell, the shell prints double-precision results when printing the results of statements:

>>> 1.222 * 1.222
1.4932839999999998

In [1]: 1.222 * 1.222
Out[1]: 1.4932839999999998

R

It looks like R is doing the same as Python when using print and sprintf:

print(1.222 * 1.222)
# 1.493284

sprintf("%.16f", 1.222 * 1.222)
# "1.4932839999999998"

In contrast to interactive Python shells, the interactive R shell also prints single-precision when printing the results of statements:

> 1.222 * 1.222
[1] 1.493284

Differences between Python and R

The differences in your results could result from using single-precision values in numpy. Calculations with a lot of additions/subtractions will ultimately make the problem surface:

In [1]: import numpy

In [2]: a = numpy.float32(1.222)

In [3]: a*6
Out[3]: 7.3320000171661377

In [4]: a+a+a+a+a+a
Out[4]: 7.3320003

As suggested in the comments to your actual question, make sure to use double-precision floats in your numpy calculations.

Multiplication of floating point numbers gives different results in Numpy and R

1 Answers1

Python

R

Differences between Python and R