Wrong correlation result for big numbers

Question

The cor() function fails to compute the correlation value if there are extremely big numbers in the vector and returns just zero:

foo <- c(1e154, 1, 0)
bar <- c(0, 1, 2)
cor(foo, bar)
# -0.8660254
foo <- c(1e155, 1, 0)
cor(foo, bar)
# 0

Although 1e155 is very big, it's much smaller than the maximum number R can deal with. It's surprising for me why R returns a wrong value and does not return a more suitable result like NA or Inf.

Is there any reason for that? How to be sure we will not face such a situation in our programs?

What version of R are you using? – Andrie Jan 15 '13 at 14:40 — Andrie, Jan 15 '13 at 14:40

score 7 · Accepted Answer · edited Jan 15 '13 at 14:43

7

Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations. (from http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient)

foo <- c(1e154, 1, 0)
sd(foo)
## [1] 5.773503e+153
foo <- c(1e155, 1, 0)
sd(foo)
## [1] Inf

And, even more fundamental, to calculate sd() you need to take the square of x:

1e154^2
[1] 1e+308

1e155^2
[1] Inf

So, your number is indeed at the boundary of what is possible to calculate using 64 bits.

Using R-2.15.2 on Windows I get:

cor(c(1e555, 1, 0), 1:3)
[1] NaN

edited Jan 15 '13 at 14:43

Andrie

176,377
47
447
496

answered Jan 15 '13 at 14:39

Matthew Lundberg

42,009
6
90
112

To be picky, you don't need to compute the square of x, you need to compute the square of x - mean(x). (Not that helps here) – hadley Jan 15 '13 at 23:02

Wrong correlation result for big numbers

1 Answers1