I am doing data analysis in Python (Numpy) and R. My data is a vector 795067 X 3 and computing the mean, median, standard deviation, and IQR on this data yields different results depending on whether I use Numpy or R. I crosschecked the values and it looks like R gives the "correct" value.
Median:
Numpy:14.948499999999999
R: 14.9632
Mean:
Numpy: 13.097945407088607
R: 13.10936
Standard Deviation:
Numpy: 7.3927612774052083
R: 7.390328
IQR:
Numpy:12.358700000000002
R: 12.3468
Max and min of the data are the same on both platforms. I ran a quick test to better understand what is going on here.
- Multiplying 1.2*1.2 in Numpy gives 1.4 (same with R).
- Multiplying 1.22*1.22 gives 1.4884 in Numpy and the same with R.
- However, multiplying 1.222*1.222 in Numpy gives 1.4932839999999998 which is clearly wrong! Doing the multiplication in R gives the correct answer of 1.49324.
- Multiplying 1.2222*1.2222 in Numpy gives 1.4937728399999999 and 1.493773 in R. Once more, R is correct.
In Numpy, the numbers are float64 datatype and they are double in R. What is going on here? Why are Numpy and R giving different results? I know R uses IEEE754 double-precision but I don't know what precision Numpy uses. How can I change Numpy to give me the "correct" answer?