3

I have 2 series of 45 values in the interval [0,1]. The first series is a human-generated standard, the second one is computer-generated (full series here http://www.copypastecode.com/74844/). The first series is sorted decreasingly.

0.909090909 0.216196598
0.909090909 0.111282099
0.9 0.021432587
0.9 0.033901106
...
0.1 0.003099256
0   0.001084533
0   0.008882249
0   0.006501463

Now what I want to assess is the degree to which the order is preserved in the second series, given that the first series is monotonic. The Pearson correlation is 0.454763067, but I think that the relationship is not linear so this value is difficult to interpret.

A natural approach would be to use the Spearman rank correlation, which in this case is 0.670556181. I noticed that with random values, while Pearson is very close to 0, the Spearman rank correlation goes up to 0.5, so a value of 0.67 seems very low.

What would you use to assess the order similarity between these 2 series?

Archie
  • 2,247
  • 1
  • 18
  • 35
Mulone
  • 3,603
  • 9
  • 47
  • 69

1 Answers1

6

I want to assess is the degree to which the order is preserved

Since it's the order (rank) that you care about, Spearman rank correlation is the more meaningful metric here.

I noticed that with random values [...] the Spearman rank correlation goes up to 0.5

How do you generate those random values? I've just conducted a simple experiment with some random numbers generated using numpy, and I am not seeing that:

In [1]: import numpy as np

In [2]: import scipy.stats

In [3]: x = np.random.randn(1000)

In [4]: y = np.random.randn(1000)

In [5]: print scipy.stats.spearmanr(x, y)
(-0.013847401847401847, 0.66184551507218536)

The first number (-0.01) is the rank correlation coefficient; the second number (0.66) is the associated p-value.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
NPE
  • 486,780
  • 108
  • 951
  • 1,012