3

I found that loading some packages will affect random number generation in R. The problem can be reproduced as follows.

(1) Open a new R session. (My case: R 4.x + RStudio)

(2) Execute the following code:

set.seed(1)
library(sf)
library(tmap)
sample(1:10, 5)

(3) The result for the first time is: 5 10 2 8 6

(4) However, if you execute the entire code many times, the result (after the first time) always is: 9 4 7 1 2

Why is the result for the first time different? It seems that loading the sf and tmap libraries for the first time will affect random number generation. So strange.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Frank KKK
  • 313
  • 1
  • 2
  • 6

1 Answers1

5

A little bit of experimentation shows that the issue is with the tmap package, not sf (i.e. from a clean R session, set.seed(1); library(sf); sample(1:10, 5) gives 9 4 7 1 2.

If we go to its GitHub repository we can see that the tmap package has a .onLoad function (here), which will get run the first time the package is loaded (technically, calling library(tmap) a second time doesn't load the package, since it's already loaded ...)

Digging in a little bit further, we can diagnose these kind of problems

set.seed(1)
r <- .Random.seed
f <- function() identical(r, .Random.seed)

and then checking f() after every step; it will be TRUE unless the random-number stream has been altered. I used this approach in this answer ...

This was hard to figure out because it's nearly impossible to not have all of the .onLoad function execute (e.g. if you call tmap:::working_internet(), that first loads the package in order to have access to the function), and it's not easy (impossible?) to set a debugging flag on the .onLoad function itself (because it's not accessible before you load the package).

I was initially misled into believing that it was one of the other packages loaded indirectly by tmap that was causing the problem (names(sessionInfo()$loadedOnly) shows that there are 52 of them!). This would be a huge pain to track down. I would probably try to do this by elimination, taking the names of those packages and excluding any loaded by (e.g.) sf or tidyverse, neither of which displays this problem. However, that turned out not to be necessary.

After some farting around (using R -d gdb and setting a breakpoint on the internal unif_rand C function), I realized that sample() was being called, and thence that a randomization step via sample() is being used in determine_tips_order (here), which is called from .onLoad. This is being used to set the order in which random "tips" about tmap are offered to the user — in my opinion, this really ought to be called the first time tmap_tip() is called, rather than at package loading ... (you could raise an issue on the package's github repo ...)

(If I could I would actually make it a CRAN repository policy that package loading not mess with the random number stream in this way ...)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453