scipy.stats.kstest(rvs, cdf, N)
can perform a KS-Test on a dataset rvs
. It tests if the dataset follows a propability distribution, whose cdf
is specified in the parameters of this method.
Consider now a dataset of N=4800
samples. I have performed a KDE on this data and, therefore, have an estimated PDF. This PDF looks an awful lot like a bimodal distribution. When plotting the estimated PDF and curve_fitting a bimodal distribution to it, these two plots are pretty much identical. The parameters of the fitted bimodal distribution are (scale1, mean1, stdv1, scale2, mean2, stdv2):
[0.6 0.036 0.52, 0.23 1.25 0.4]
How can I apply scipy.stats.kstest
to test if my estimated PDF is bimodal distributed?
As my null hypothesis, I state that the estimated PDF equals the following PDF:
hypoDist = 0.6*norm(loc=0, scale=0.2).pdf(x_grid) + 0.3*norm(loc=1, scale=0.2).pdf(x_grid)
hypoCdf = np.cumsum(hypoDist)/len(x_grid)
x_grid
is just a vector that contains the x-values at which I evaluate my estimated PDF. So each entry of pdf
has a corresponding value of x_grid
. It might be that my computation of hypoCdf
is incorrect. Maybe instead of dividing by len(x_grid)
, should I divide by np.sum(hypoDist)
?
Challenge: cdf
parameter of kstest
cannot be specified as bimodal. Neither can I specify it to be hypoDist
.
If I wanted to test whether my dataset was Gaussian distributed, I would write:
KS_result = kstest(measurementError, norm(loc=mean(pdf), scale=np.std(pdf)).cdf)
print(KS_result)
measurementError
is the dataset that I have performed the KDE on. This returns:
statistic=0.459, pvalue=0.0
To me, it is a little irritating that the pvalue is 0.0