I had a set of data points (let's say X, Y, Z, etc) and showed that they have a Pearson Correlation Coefficient of 0.7. Is it possible to see how each data point contributes to the correlation coefficient ? i.e. be able to say point X contributes negatively to the Pearson Correlation and by how much. Thanks for all your insight and help !!
Asked
Active
Viewed 257 times
3
-
You could always try leaving that data point out and seeing how it affects the coefficient (i.e., report the change in coefficient). – Travis Addair Jul 02 '13 at 02:00
-
Cool question, but you might have more luck over here: http://math.stackexchange.com/ – Fiarr Jul 02 '13 at 02:01
-
Thanks, David. That is a great idea. Is there a way one can do this type of testing programmatically for each data point in a long series (at least ~100 points) ? Thanks ! – Ben Wong Jul 02 '13 at 02:31
-
Wolfram Mathematica provides a great toolkit for statistical data analysis. You might want to take a look at the [documentation](http://reference.wolfram.com/mathematica/ref/Correlation.html) – Ashton H. Jul 02 '13 at 04:30
-
The only way to that (leaving a data point out) efficiently, I think, would be to cache a bunch of intermediate results that you might normally calculate in calculating the coefficient (algebraic expansion might be necessary). If you are just looking for outliers, though, you wouldn't really need to calculate the correlation coefficient for every set of points minus the current point. You'd be most concerned about the points whose individual terms in the Pearson coefficient summation deviate the most from the correlation, AFAIK. – JayC Jul 02 '13 at 04:40
-
Of course if your data is skewed by outliers, what I just said might not be quite right, now that I think of it... – JayC Jul 02 '13 at 04:45
-
Yeah... Ignore what I said about "individual terms in the Pearson coefficient summation (that) deviate the most", I'm sure that's not right. I'm sure there's a way to find candidate outliers but I'm not sure what that'd be without calculating a covariance matrix or something... – JayC Jul 02 '13 at 05:18
1 Answers
0
Pearsons can be used to compute a linear regression line of the form X = rY + b. Use the distance from this line as a metric for how well a given data point correlated with the rest.

antiduh
- 11,853
- 4
- 43
- 66
-
I have to wonder if you have a bunch of points scattered with (x,y) generally in (0,1) and then you have one point at (4000000,4000000), how much that one point might steer the results towards a correlation when there might not be any. I don't think this advise works in that case (but I have yet to do the calculations to prove it). – JayC Jul 02 '13 at 04:54
-
You've got a point, but then your standard deviation would be huge, which could factor into this metric. – antiduh Jul 02 '13 at 05:10