2

The cor() function fails to compute the correlation value if there are extremely big numbers in the vector and returns just zero:

foo <- c(1e154, 1, 0)
bar <- c(0, 1, 2)
cor(foo, bar)
# -0.8660254
foo <- c(1e155, 1, 0)
cor(foo, bar)
# 0

Although 1e155 is very big, it's much smaller than the maximum number R can deal with. It's surprising for me why R returns a wrong value and does not return a more suitable result like NA or Inf.

Is there any reason for that? How to be sure we will not face such a situation in our programs?

Ali
  • 9,440
  • 12
  • 62
  • 92

1 Answers1

7

Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations. (from http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient)

foo <- c(1e154, 1, 0)
sd(foo)
## [1] 5.773503e+153
foo <- c(1e155, 1, 0)
sd(foo)
## [1] Inf

And, even more fundamental, to calculate sd() you need to take the square of x:

1e154^2
[1] 1e+308

1e155^2
[1] Inf

So, your number is indeed at the boundary of what is possible to calculate using 64 bits.

Using R-2.15.2 on Windows I get:

cor(c(1e555, 1, 0), 1:3)
[1] NaN
Andrie
  • 176,377
  • 47
  • 447
  • 496
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • To be picky, you don't need to compute the square of x, you need to compute the square of x - mean(x). (Not that helps here) – hadley Jan 15 '13 at 23:02