Why does spearman produce different result on zscore?

Question

Hi it seems that spearman correlation should produce the same result regardless if its zscore or raw. Here are two examples.

https://stats.stackexchange.com/questions/77562/why-does-correlation-come-out-the-same-on-raw-data-and-z-scored-standardized-d

https://stats.stackexchange.com/questions/13952/can-spearmans-correlation-be-run-on-z-scores

However for this example here the two correlation are different and I'm wondering what is going on.

df = read.csv("https://www.dropbox.com/s/jdktw9jugzm97v3/test.csv?dl=1", head=F)

cor(df[, 1], df[,2], method="spearman")
cor(scale(df[, 1]), scale(df[,2]), method="spearman")

# 0.8462699 vs 0.8905341

Interestingly pearson gives the same result. I'm wondering what I'm doing or thinking incorrectly here?

edit: so in addition I thought may be this is due to ties so I also use kendall which should handle ties however it also gives different results.

cor(as.matrix ( df[, 1] ) , as.matrix ( df[,2] ), method="kendall" )
cor(scale(as.matrix ( df[, 1] )), scale(as.matrix ( df[,2] )),  method="kendall")

thanks.

I'm not sure what's going on here but I noticed that if you add a small constant to both columns (I tried adding 1, 100, -1, and adding 1 to one column and subtracting 1 from the other), the correlation is .9157, regardless of whether you scale it or not. So I wonder if this has something to do with numerical instability; both columns have entries which are extremely close to 0 and those might be throwing things off. Spearman's correlation certainly ought to be scale invariant, since rescaling won't change the ranks. — Joseph Clark McIntyre, Jan 21 '19 at 01:58
@JosephClarkMcIntyre yes so weird I also used the cor.test with ties, ```cor.test(df[, 1] , df[,2] , method = "spearm", exact = FALSE) cor.test(scale ( df[, 1] ) , scale ( df[,2] ) , method = "spearm", exact = FALSE) ``` still different — Ahdee, Jan 21 '19 at 02:12
For sure this is a rounding error. You have data points orders of magnitude smaller than `.Machine$double.eps` and over 20 orders of magnitude range in the data. You can rproduce with fake data like this `df = data.frame( x = (rnorm(20,10,2) + (1:20)/2)*10^(-18:1), y = rnorm(20,20,3) + (1:20)/3 )` — dww, Jan 21 '19 at 03:09
@dww thanks you are right. When I rounded to 15 digits the results are the same. — Ahdee, Jan 21 '19 at 16:25

score 1 · Answer 1 · answered Jan 21 '19 at 16:25

Hi as mentioned above in the comments this was due to a rounding error. No one answered but I wanted to add this in case someone else stumble on a similar issue. So when I round to 15-16 digits the results are the same.

df = read.csv("https://www.dropbox.com/s/jdktw9jugzm97v3/test.csv?dl=1", head=F)

df = round(df, digits = 15)

cor(as.matrix ( df[, 1] ) , as.matrix ( df[,2] ), method="spearman" )
cor(scale(df[, 1] ), scale(df[,2] ),  method="spearman")

thanks everyone for helping with this.

Why does spearman produce different result on zscore?

1 Answers1