2

I have the following matrices :

EurodistCL.scl

And:

EurodistM.scl

I want to calculate their row-wise pearson correlations and I've tried these pieces of code:

RowCor<- sapply(1:21, function(i) cor(EurodistCL.scl[i,], EurodistM.scl[i,], method = "pearson"))

And:

cA <- EurodistCL.scl - rowMeans(EurodistCL.scl)
cB <- EurodistM.scl- rowMeans(EurodistM.scl)
sA <- sqrt(rowMeans(cA^2))
sB <- sqrt(rowMeans(cB^2))
rowMeans(cA * cB) / (sA * sB)

Both give the same output, a correlation vector of 21 ones.

Although the matrices are clearly highly correlated, they are not perfectly correlated so I would expect some correlation coefficient to be 0.99 or 0.98

Why am I getting only ones? Is something wrong in the code or in the theory?

q0mlm
  • 313
  • 1
  • 3
  • 10

1 Answers1

3

It is because you have only two values in a row. Even random values would give (+ or -) 1. Try this

a <- runif(2)
b <- runif(2)
cor(a, b)

So, it is the theory that is incorrect. Although one can get a coefficient of correlation with two samples, it is of little use.

To estimate correlation coefficient, you need more than two corresponding samples.

kangaroo_cliff
  • 6,067
  • 3
  • 29
  • 42
  • I believe I'm not getting it. Where should I put the runifs in the code? I would like to calculate how the coordinates of the cities of the first matrix correlate with the coordinates of the cities of the second matrix? Isn't a row-wise correlation a good measure? – q0mlm Nov 25 '20 at 01:14
  • Consider `a` and `b` to be corresponding rows in the two matrices. I am trying to demonstrate why you are always getting 1. You should get 1 but it is meaningless. The problem is you have only two values (here columns). When ever you have only two values you always get correlation of 1. – kangaroo_cliff Nov 25 '20 at 01:18
  • Ok, thank you I was totally misunderstanding how a row-wise correlation works. So if I want to see how those matrices differ coordinates-wise and how much they are correlated what measures should I look into? – q0mlm Nov 25 '20 at 01:22
  • This isn't because the correlation is row-wise. It is because there wasn't enough columns. – kangaroo_cliff Nov 25 '20 at 01:26
  • As to what you should do, it is not clear what you are trying to do. Maybe post that as a separate question with more explation. – kangaroo_cliff Nov 25 '20 at 01:27