0

I am fitting my data to the lognormal, and I do the KS test in Python and R and I get very different results.

The data are:

series
341 291 283 155 271 270 250 272 209 236 295 214 443 632 310 334 376 305 216 339

In R the code is:

fit = fitdistr(series, "lognormal")$estimate
fit
meanlog
5.66611754205579
sdlog
0.290617205700481
ks.test(series, "plnorm", meanlog=fit[1], sdlog=fit[2], exact=TRUE)
One-sample Kolmogorov-Smirnov test

data:  series
D = 0.13421, p-value = 0.8181
alternative hypothesis: two-sided

In Python the code is:

distribution = stats.lognorm
args = distribution.fit(series)
args
(4.2221814852591635, 154.99999999212395, 0.45374242945626875)
stats.kstest(series, distribution.cdf, args, alternative = 'two-sided')
KstestResult(statistic=0.8211678552361514, pvalue=2.6645352591003757e-15)
halfer
  • 19,824
  • 17
  • 99
  • 186
user8270077
  • 4,621
  • 17
  • 75
  • 140
  • I don't know the answer because I don't know Python and you are not describing where one can read up on the Python functions, but the R packages have been around a lot longer and been tested much more thoroughly that the Python statistics packages. The `args` results look quite different than parameter estimates from `fitdistrplus::fitdistr` – IRTFM Nov 06 '18 at 18:35
  • @IRTFM v1.0 of R was released in Feb 2000. SciPy was initially released in 2001, while Python itself has been around since ~1991. So, not exactly "a lot longer". I would caution against disparaging Python/SciPy just because you are personally unfamiliar with them—documentation for Scipy's `stats` module, as well as the Python language as a whole, is readily available online. – L0tad Jan 05 '23 at 21:40

1 Answers1

0

The SciPy implementation of the log-normal distribution is not parameterized in the same way as it is in the R code. Search for [scipy] lognorm here on stackoverflow for many similar questions, and see the note about the parameterization in the lognorm docstring. Also note that to match the R result, the location parameter loc must be fixed at the value 0 using the argument floc=0. The R implementation does not include a location parameter.

Here's a script that shows how to get the same values that are reported by R:

import numpy as np
from scipy.stats import lognorm, kstest


x = [341, 291, 283, 155, 271, 270, 250, 272, 209, 236,
     295, 214, 443, 632, 310, 334, 376, 305, 216, 339]


sigma, loc, scale = lognorm.fit(x, floc=0)

mu = np.log(scale)

print("mu    = %9.5f" % mu)
print("sigma = %9.5f" % sigma)

stat, p = kstest(x, 'lognorm', args=(sigma, 0, scale), alternative='two-sided')
print("KS Test:")
print("stat    = %9.5f" % stat)
print("p-value = %9.5f" % p)

Output:

mu    =   5.66612
sigma =   0.29062
KS Test:
stat    =   0.13421
p-value =   0.86403

The kstest function in SciPy does not have an option to compute the exact p-value. To compare its value to R, you can use exact=FALSE in fitdistr:

> ks.test(series, "plnorm", meanlog=fit[1], sdlog=fit[2], exact=FALSE)

    One-sample Kolmogorov-Smirnov test

data:  series
D = 0.1342, p-value = 0.864
alternative hypothesis: two-sided
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214