0

I need to cross-validate an R code in python. My code contains lots of pseudo-random number generations, so, for an easier comparison, I decided to use rpy2 to generate those values in my python code "from R".

As an example, in R, I have:

set.seed(1234)
runif(4)
[1] 0.1137034 0.6222994 0.6092747 0.6233794

In python, using rpy2, I have:

import rpy2.robjects as robjects
set_seed = robjects.r("set.seed")
runif =  robjects.r("runif")
set_seed(1234)
print(runif(4))
[1] 0.1137034 0.6222994 0.6092747 0.6233794

as expected (values are similar). However, I face a strange behavior with the R sample function (equivalent to the numpy.random.choice function).

As the simplest reproducible example, I have in R:

set.seed(1234)
sample(5)
[1] 1 3 2 4 5

while in python I have:

sample =  robjects.r("sample")
set_seed(1234)
print(sample(5))
[1] 4 5 2 3 1

The results are different. Could anyone explain why this happens and/or provide a way to get similar values in R and python using the R sample function?

Regis
  • 148
  • 1
  • 6

2 Answers2

1

If you print the value of the R function RNGkind() in both situations, I suspect you won't get the same answer. The Python result looks like the default output, while your R result looks like the old buggy output.

For example, in R:

set.seed(1234, sample.kind = "Rejection")
sample(5)
#> [1] 4 5 2 3 1
set.seed(1234, sample.kind = "Rounding")
#> Warning in set.seed(1234, sample.kind = "Rounding"): non-uniform 'Rounding'
#> sampler used
sample(5)
#> [1] 1 3 2 4 5
set.seed(1234, sample.kind = "default")
sample(5)
#> [1] 4 5 2 3 1

Created on 2021-01-15 by the reprex package (v0.3.0)

So it looks to me as though you are still using the old "Rounding" method in your R session. You probably saved a workspace a long time ago, and have reloaded it since. Don't do that, start with a clean workspace each session.

user2554330
  • 37,248
  • 4
  • 43
  • 90
0

Maybe give this a shot (stackoverflow answer from here). Quoting the answer : "The p argument corresponds to the prob argument in the sample()function"

import numpy as np
np.random.choice(a, size=None, replace=True, p=None)
thehand0
  • 1,123
  • 4
  • 14