5

I would like to compare two 2D distributions statistically. Thus I would like to use the Peacock test (a 2D analogue of the Kolmogorov-Smirnov test). There is an R package called Peacock.test which claims to implement it.

But the documentation is quite sparse for this package, i.e.:

The two functions: peacock2 and peacock3, provided in this package are self-explanatory and their usage is straightforward.

In particular I could not find what the output of the peacock() function represents (I guess this is something like a p-value) ? Has anyone has tested this function and could they tell me what it is (and if this function is reliable?)?

Usage example:

x <- matrix(rnorm(12, 0, 1), ncol=2)
y <-  matrix(rnorm(16, 0, 1), ncol=2)
ks <- peacock2(x, y)
ks
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
MassCorr
  • 349
  • 1
  • 8

1 Answers1

5

I don't know if it's reliable or not, but the code looks fairly straightforward.

Now the bad news: based on ?peacock2, what the function gives you is

Value:

the value of the test statistic

This means that it is not giving you the p-value. The original paper ("Two-dimensional goodness-of-fit testing in astronomy", JA Peacock, Monthly Notices of the Royal Astronomical Society, 1983), gives a table of critical values derived from Monte Carlo simulation, and an analytical approximation. To get a p-value from your test statistic, you'll have to (1) convert your test-statistic D into a Z statistic (section 3.5 says Z=sqrt(n1*n2/(n1+n2))*D for the two-sample test provided both n values are >10), and then suggests you can approximate this by P(>Z)=2*exp(-2*(Z-0.5)^2).

I would definitely recommend reading the original paper and double-checking/deriving this for yourself if you intend to use it.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    Thank you ! I am on my way to read this paper (without too much enthusiasm I must confess ) :) – MassCorr Oct 30 '17 at 14:16
  • The paper 'Kolmogorov-Smirnov Test For Two-Dimensional Data' by William H. Press and Saul A. Teukolsky or section 14.7 in their 'Numerical Recipes in C' give an a simple formula for the computation of significance level approximation which apparently works well for N >20. – mjs Oct 28 '19 at 10:02