0

For calculation the pearsons coefficient between two arrays I use the following :

    double[] arr1 = new double[4];
    arr1[0] = 1;
    arr1[1] = 1;
    arr1[2] = 1;
    arr1[3] = 1;

    double[] arr2 = new double[4];
    arr2[0] = 1;
    arr2[1] = 1;
    arr2[2] = 1;
    arr2[3] = 1;
PearsonsCorrelation pc = new PearsonsCorrelation();
println("Correlation is "+pc.correlation(arr1, arr2));

For output I receive : Correlation is NaN

The PearsonsCorrelation class is contained in the apache commons API : http://commons.apache.org/proper/commons-math/userguide/stat.html

The values in each of the arrays is based on whether or not a user contains a word in their dataset. The above arrays should be perfectly correlated ?

This question is related to How to set a value's for calculating Eucludeian distance and correlation

Community
  • 1
  • 1
blue-sky
  • 51,962
  • 152
  • 427
  • 752

2 Answers2

1

Someone had a similar issue here [link]. Apparently, the issue is related to having a 0 standard deviation in your arrays.

Community
  • 1
  • 1
Julian Ortega
  • 947
  • 4
  • 11
0

You attempt to compute the correlation between two vectors of length four. As all values in each vector are the same (0 in one vector, 1 in the other), this is equivalent to attempting to compute the correlation coefficient between two numbers (0 and 1 on this case).

It is perhaps obvious to see that there is no such a thing; you need at least two distinct pairs. Just as you cannot draw a meaningful regression line if you only have one pair of values.

If only one of the vectors had some variation, the result would still be NA, but it in that case it would be reasonable to set it to zero.

Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63