I noticed when creating a PSOCK
cluster via parallel
that the child processes were by default populated with a .Random.seed.
. This confused me because there is nothing in the documentation to indicate that this should be the case. More specifically, section 6 Random-number generation from the vignette of parallel
says that the .Random.seed
is created when the random generation is first used (i.e., see bold text below).
When an
R
process is started up it takes the random-number seed from the object.Random.seed
in a saved workspace or constructs one from the clock time and process ID when random-number generation is first used (see the help onRNG
). Thus worker processes might get the same seed because a workspace containing.Random.seed
was restored or the random number generator has been used before forking: otherwise these get a non-reproducible seed (but with very high probability a different seed for each worker).
This appears to be corroborated by the information under Note in ?RNG
(e.g., "one is created [...] when one is required"):
Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.
This sent me down a nasty rabbit hole of painstaking debugging only to learn that this .Random.seed
is created by a package that I import that in turn depends on snow
.
To reproduce this, consider the following code in a file called script.R
:
# Expect the global environment to be clean.
cat("Before package load:", paste0(ls(all.names = TRUE)), "\n")
# Load package.
library(snow)
# Expect the global environment to remain clean.
cat("After package load:", paste0(ls(all.names = TRUE)), "\n")
Running Rscript script.R
yields:
Before package load:
After package load: .Random.seed
I am afraid that this question might generate opinionated answers and be flagged as inappropriate. However, I really want to know whether this behavior is considered acceptable, i.e., for a package to populate the .GlobalEnv
while it is being attached. And whether there are any unintended consequences to this.
Additional information:
For those curious what exactly in snow
is invoking the RNG
, it seems that the RNG
is invoked inside the function initDefaultClusterOptions()
for selecting a port number:
11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)
And then initDefaultClusterOptions()
is called from within .onLoad()
when the package is being attached.