Here is what I am trying to do:
- I want to use the discrete Kolmogorov-Smirov Goodness-of-fit test, which is currently only available in R. Further, R has the normal KS test as well -- I do not want to use this test.
- I am a python user, so need to port the discrete KS test to python, to do this I am trying to use
rpy2
.
The problem I am facing, as detailed in more statistical detail here, is that rpy2
seems to replace the imported discrete test with the standard version. I know this because it does not produce the right answer when tested.
Attempts so far
import rpy2.robjects.packages as r
utils = r.importr("utils")
package_name = "dgof"
utils.install_packages(package_name)
results in
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The downloaded source packages are in
‘/tmp/RtmpTBas6a/downloaded_packages’
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Updating HTML index of packages in '.Library'
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Making 'packages.html' ...
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: done
warnings.warn(x, RRuntimeWarning)
rpy2.rinterface.NULL
Ok, so far so good, that should have installed it. So lets import it:
# Import Discrete goodness-of-fit package which includes KS and CVM tests.
dgof = rpackages.importr('dgof')
Has it really imported it? Lets see:
env = r.wherefrom('dgof')
returns
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error: object 'dgof' not found
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: In addition:
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Warning message:
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: In (function (x, y, ..., alternative = c("two.sided", "less", "greater"), :
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:
warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: cannot compute correct p-values with ties
warnings.warn(x, RRuntimeWarning)
warnings.warn(x, RRuntimeWarning)
Ok that's weird, but maybe it works anyway, lets see (this is exactly the same example as used on the R side and should return D = 0.66667, p-value = 0.07407
) :
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
import numpy as np
a = np.array([1,1,1])
b = np.arange(1,3)
dgof.ks_test(a,b)
returns
D = 0.5, p-value = 0.925086
If this doesn't mean anything to you that's fine, what you need to know is that it is wrong. It seems to be wrong because, somehow, the standard ks_test
is being loaded in place of the discrete one (the one we talk about in item 2 in the above list). Lets verify, by loading the standard library and the KS test:
from rpy2.robjects.packages import importr
base = importr('base')
stats = importr('stats')
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
import numpy as np
a = np.array([1,1,1])
b = np.arange(1,3)
stats.ks_test(a,b)
returns
D = 0.5, p-value = 0.925086
So that's cool -- does anyone know why this may be happening?
NOTE: this question is related to my other question, but with lots more detail on the python side of things.