2

I noticed when creating a PSOCK cluster via parallel that the child processes were by default populated with a .Random.seed.. This confused me because there is nothing in the documentation to indicate that this should be the case. More specifically, section 6 Random-number generation from the vignette of parallel says that the .Random.seed is created when the random generation is first used (i.e., see bold text below).

When an R process is started up it takes the random-number seed from the object .Random.seed in a saved workspace or constructs one from the clock time and process ID when random-number generation is first used (see the help on RNG). Thus worker processes might get the same seed because a workspace containing .Random.seed was restored or the random number generator has been used before forking: otherwise these get a non-reproducible seed (but with very high probability a different seed for each worker).

This appears to be corroborated by the information under Note in ?RNG (e.g., "one is created [...] when one is required"):

Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.

This sent me down a nasty rabbit hole of painstaking debugging only to learn that this .Random.seed is created by a package that I import that in turn depends on snow.

To reproduce this, consider the following code in a file called script.R:

# Expect the global environment to be clean.
cat("Before package load:", paste0(ls(all.names = TRUE)), "\n")

# Load package.
library(snow)

# Expect the global environment to remain clean.
cat("After package load:", paste0(ls(all.names = TRUE)), "\n")

Running Rscript script.R yields:

Before package load:  
After package load: .Random.seed

I am afraid that this question might generate opinionated answers and be flagged as inappropriate. However, I really want to know whether this behavior is considered acceptable, i.e., for a package to populate the .GlobalEnv while it is being attached. And whether there are any unintended consequences to this.


Additional information:

For those curious what exactly in snow is invoking the RNG, it seems that the RNG is invoked inside the function initDefaultClusterOptions() for selecting a port number:

11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1)

And then initDefaultClusterOptions() is called from within .onLoad() when the package is being attached.

Mihai
  • 2,807
  • 4
  • 28
  • 53

0 Answers0