0

I am trying to use the dgof module from R, in Python 3 via rpy2.

I use it inside python as so:

# import rpy2's package module
import rpy2.robjects.packages as rpackages

# Import R's utility package
utils = rpackages.importr('utils')

# Select a mirror for R packages
utils.chooseCRANmirror(ind=1) # select the first mirror in the list

# R vector of strings
from rpy2.robjects.vectors import StrVector

# Install R package name: 'dgof' (discrete goodness-of-fit) is what we're interested in
if rpackages.isinstalled('dgof') is False:
    utils.install_packages(StrVector('dgof'))

# Import dgof
dgof = rpackages.importr('dgof')

Works a charm (i.e. I can import it, which is a big win in itself). Now as a test I wanted to reproduce the example result here, from the API documentation.

For clarity, in pure R, the example is (and to be clear, this function is NOT stats::ks.test(rep(1, 3), ecdf(1:3)) but native dgof):

ks.test(rep(1, 3), ecdf(1:3))

which results in a p-value of 0.07407 (to verify this, click on the green "Run this code" button in this link). Note that:

> ecdf(1:3)
Empirical CDF 
Call: ecdf(1:3)
 x[1:3] =      1,      2,      3
> rep(1,3)
[1] 1 1 1

In Python the reproduced example is:

import numpy as np
a = np.array([1,1,1])
b = np.arange(1,4)
dgof.ks_test(a,b)

But in the example, the p-value I find is 0.517551. The KS-statistic itself is correctly calculated. But why is the simulated p-value different? Again to see the output of the dgof example in the link, press Run this example and you'll see the numbers that I am referring to (reproduced above).

Astrid
  • 1,846
  • 4
  • 26
  • 48
  • 1
    It looks like you are comparing the output of `ks.test()` an R function in the base R package `stats` with an other function also called `ks.test()` but found in an R packages called `dgof`. I am not sure this has much to do with rpy2. – lgautier Jan 08 '19 at 04:28
  • @lgautier thanks for your reply, I have clarified my question. `ks.test()` is not from `stats` but `dgof` which is why I find this so confusing. I am literally just trying to replicate the example in the API. – Astrid Jan 08 '19 at 11:14
  • Unless you import the package dgof in your R, and if you do your R code is not showing it, R will get `ks.test()` from `stats`. – lgautier Jan 09 '19 at 03:49
  • Very odd.. Then I do not understand how `dgof` is activated on the python end, in any other way than what I have done in the above question. – Astrid Jan 09 '19 at 11:22
  • @lgautier have confirmed that you're correct. So `rpy2` is somehow ignoring the `dgof` import and going straight for the standard KS test found in `stats`. – Astrid Jan 09 '19 at 12:03

0 Answers0