I have an R project in which it is desirable (both for debugging and for analysis) if I can set a random seed, make a series of calls to random distribution functions, and have the state of the PRNG at any point in that series depend only on the number of random variables I have generated and possibly which distribution functions I have generated them from, but not on any other parameters supplied to those functions, i.e., I would like the output from the print statements in
f = function(N1, N2, m, M, shape, scale) {
set.seed(0xBEEF)
x1 = runif(N1, m, M)
print(.Random.seed)
x2 = rgamma(N2, shape=shape, scale=scale)
print(.Random.seed)
#etc.
}
to only depend the values of N1
and N2
, and not on the values of m
, M
, shape
, or scale.
The standard R functions (runif
, rgamma
, rbinom
, etc.) do not have this property. runif
and rbinom
both skip calls to the underlying PRNG for "distributions" with only one possible value (e.g., runif(3, 0, 0:2)
and runif(2, 0, 1:2)
require the same number of PRNG calls to resolve). rgamma
is even worse (from this perspective): it doesn't necessarily make the same number of calls to the underlying PRNG with different shape
and scale
parameters (but the same n
), even when these are both finite and >0
.
The tentative solution I've come up with is instead of
x = rgamma(N, shape=shape, scale=scale)
to write
u = runif(N, 0, 1)
x = qgamma(u, shape=shape, scale=scale
and similarly for other distributions (using qnorm
, qbinom
, etc.). But I'd like to confirm that I'm not sacrificing (a meaningful amount of) precision by doing so. Does anyone know enough about how the relevant functions work "under the hood" to assess this?
(Alternatively or in addition, are there any standard packages already addressing this issue?)