SCIPY Kolmogorov Smirnov test yielding small p-values even with random data generated from given distribution

Question

data = np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=1000)

cdfx = multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]]).cdf

ks_1samp(x=data, cdf=cdfx)

KstestResult(statistic=0.9930935227267083, pvalue=0.0)

Shouldn't the P-value be high?

Warren Weckesser · Answer 1 · 2022-09-15T22:14:46.850

1

The Kolmogorov-Smirnov test is for univariate distributions. See the section "The Kolmogorov–Smirnov statistic in more than one dimension" for a discussion of a multivariate generalization.

ks_1samp expects the input x to be one-dimensional, and it expects the cdf function to be the CDF of a univariate distribution. It does not validate these properties, so the behavior is undefined (and, clearly, nonsense) if the expectations are not met.

With the univariate normal distribution, it works as you expect:

In [20]: from scipy.stats import ks_1samp, norm

In [21]: x = norm.rvs(size=1000)

In [22]: ks_1samp(x, norm.cdf)
Out[22]: KstestResult(statistic=0.025983100250768443, pvalue=0.5011047711453744)

edited Sep 15 '22 at 22:14

answered Sep 15 '22 at 22:09

Warren Weckesser

110,654
19
194
214

Thanks! Do you have a recommendation for a goodness of fit test for multivariate samples? I'm trying to see if a sample is gaussian, non-gaussian, or a mixture of gaussians. I expect all my samples to fall into one of those categories so I'm trying to classify them accordingly. – Imp Sep 15 '22 at 22:11
That sounds like a good question for the [Cross Validated](https://stats.stackexchange.com/) site. – Warren Weckesser Sep 15 '22 at 22:13

SCIPY Kolmogorov Smirnov test yielding small p-values even with random data generated from given distribution

1 Answers1