1

Here is what I am trying to do:

  1. I want to use the discrete Kolmogorov-Smirov Goodness-of-fit test, which is currently only available in R. Further, R has the normal KS test as well -- I do not want to use this test.
  2. I am a python user, so need to port the discrete KS test to python, to do this I am trying to use rpy2.

The problem I am facing, as detailed in more statistical detail here, is that rpy2 seems to replace the imported discrete test with the standard version. I know this because it does not produce the right answer when tested.

Attempts so far

import rpy2.robjects.packages as r
utils = r.importr("utils")
package_name = "dgof"
utils.install_packages(package_name)

results in

/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: 

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: 
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: The downloaded source packages are in
    ‘/tmp/RtmpTBas6a/downloaded_packages’
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Updating HTML index of packages in '.Library'

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Making 'packages.html' ...
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:  done

  warnings.warn(x, RRuntimeWarning)
rpy2.rinterface.NULL

Ok, so far so good, that should have installed it. So lets import it:

# Import Discrete goodness-of-fit package which includes KS and CVM tests.
dgof = rpackages.importr('dgof')

Has it really imported it? Lets see:

env = r.wherefrom('dgof')

returns

/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error: object 'dgof' not found

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: In addition: 
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Warning message:

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: In (function (x, y, ..., alternative = c("two.sided", "less", "greater"),  :
  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: 

  warnings.warn(x, RRuntimeWarning)
/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning:  cannot compute correct p-values with ties

  warnings.warn(x, RRuntimeWarning)

  warnings.warn(x, RRuntimeWarning)

Ok that's weird, but maybe it works anyway, lets see (this is exactly the same example as used on the R side and should return D = 0.66667, p-value = 0.07407) :

import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
import numpy as np
a = np.array([1,1,1])
b = np.arange(1,3)
dgof.ks_test(a,b)

returns

D = 0.5, p-value = 0.925086

If this doesn't mean anything to you that's fine, what you need to know is that it is wrong. It seems to be wrong because, somehow, the standard ks_test is being loaded in place of the discrete one (the one we talk about in item 2 in the above list). Lets verify, by loading the standard library and the KS test:

from rpy2.robjects.packages import importr
base     = importr('base')
stats    = importr('stats')
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
import numpy as np

a = np.array([1,1,1])
b = np.arange(1,3)
stats.ks_test(a,b)

returns

D = 0.5, p-value = 0.925086

So that's cool -- does anyone know why this may be happening?

NOTE: this question is related to my other question, but with lots more detail on the python side of things.

Astrid
  • 1,846
  • 4
  • 26
  • 48

1 Answers1

0

Has it really imported it? Lets see:

env = r.wherefrom('dgof')

returns

/home/usr/anaconda3/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error: object 'dgof' not found

The RRuntimeWarning comes from R itself, and is what one would expect. There is no object dgof because R package namespaces are not objects.

What you want is likely wherefrom('ks.test') (see https://rpy2.github.io/doc/v2.9.x/html/robjects_rpackages.html#finding-where-an-r-symbol-is-coming-from).

Many things can be happening here between, depending on what the package 'dgof' is doing (if you are coming from Python, R can let package developers do really strange things).

Did you try relying on R's dispatch and function overloading mechanisms ? After loading the R package dgof, call ks.test without specifying a namespace.

dgof = rpackages.importr('dgof')
import rpy2.robjects
# "generic" function ks.test
ks_test = rpy2.robjects.r('ks.test')
# Use it
ks_test(a, b)
lgautier
  • 11,363
  • 29
  • 42
  • Hi! I gave this a shot but it still does not work I am afraid. – Astrid Jan 23 '19 at 13:34
  • Well, since this using R's own dispatch for `ks.test`, either the variables your are passing through rpy2 differ, or there is something else with the pure R version you are reporting to be working (I don't have `dgof` to try at the moment). To assess the former, try the following to see how your Python vectors get converted: `robjects.globalenv['a']` then `robjects.r('print(a)')` – lgautier Jan 23 '19 at 22:08