4

So we can run a K-S test to assess if we have a difference in the distribution of dtwo datasets, as outlined here.

So lets take the following data

set.seed(123)
N <- 1000
var1 <- runif(N, min=0, max=0.5)
var2 <- runif(N, min=0.3, max=0.7)
var3 <- rbinom(n=N, size=1, prob = 0.45)

df <- data.frame(var1, var2, var3)

We can then seperate based on var3 outcome

df.1 <- subset(df, var3 == 1)
df.2 <- subset(df, var3 == 0)

Now we can run a Kolmogorov–Smirnov test to test for differences in the distributions of each individual variable.

ks.test(jitter(df.1$var1), jitter(df.2$var1))
ks.test(jitter(df.1$var2), jitter(df.2$var2))

And not suprisngly, we do not get a difference and can assume the different dataset have been drawn from the same distribution. This can be visualised through:

plot(ecdf(df.1$var1), col=2)
lines(ecdf(df.2$var1))

plot(ecdf(df.1$var2), col=3)
lines(ecdf(df.2$var2), col=4)

But now we want to consider if the distributions between var3==0 and var3==1 differ when we consider both var1 & var2 together. Is there an R package to run such a test when we have a multiple predictors

The similar question was posed here, but has not received any answers

There appears to be some literature: Example 1 Example 2

But nothing appears to be linked to R

lukeg
  • 1,327
  • 3
  • 10
  • 27
  • 1
    This question appears to be off-topic because it is about statistics and not really a specific programming question. Perhaps it's better to ask this on [Cross Validated](http://stats.stackexchange.com) – Jaap Jul 22 '15 at 15:31

1 Answers1

1

Two dimensional KS test has been discussed in Peacock, J. A. (1983). Two-dimensional goodness-of-fit testing in astronomy. Monthly Notices of the Royal Astronomical Society, 202(3), 615–627. https://doi.org/10.1093/mnras/202.3.615

There is an implementation, https://cran.r-project.org/web/packages/Peacock.test/

ya wei
  • 77
  • 5