1

I have a clarification question. It is my understanding, that sourceCpp automatically passes on the RNG state, so that set.seed(123) gives me reproducible random numbers when calling Rcpp code. When compiling a package, I have to add a set RNG statement. Now how does this all work with openMP either in sourceCpp or within a package?

Consider the following Rcpp code

#include <Rcpp.h>
#include <omp.h>

// [[Rcpp::depends("RcppArmadillo")]]

// [[Rcpp::export]]
Rcpp::NumericVector rnormrcpp1(int n, double mu, double sigma  ){
  Rcpp::NumericVector out(n);
        for (int i=0; i < n; i++) {
          out(i) =R::rnorm(mu,sigma);
        }

  return(out);
}


// [[Rcpp::export]]
Rcpp::NumericVector rnormrcpp2(int n, double mu, double sigma, int cores=1  ){
  omp_set_num_threads(cores);
  Rcpp::NumericVector out(n);
    #pragma omp parallel for schedule(dynamic)   
        for (int i=0; i < n; i++) {
          out(i) =R::rnorm(mu,sigma);
        }

  return(out);
}

And then run

   set.seed(123)
    a1=rnormrcpp1(100,2,3,2)
    set.seed(123)
    a2=rnormrcpp1(100,2,3,2)
    set.seed(123)
    a3=rnormrcpp2(100,2,3,2)
    set.seed(123)
    a4=rnormrcpp2(100,2,3,2)
    all.equal(a1,a2)
    all.equal(a3,a4)

While a1 and a2 are identical, a3 and a4 are not. How can I adjust the RNG state with the openMP loop? Can I?

Inferrator
  • 519
  • 1
  • 5
  • 13

2 Answers2

2

To expand on what Dirk Eddelbuettel has already said, it is next to impossible to both generate the same PRN sequence in parallel and have the desired speed-up. The root of this is that generation of PRN sequences is essentially a sequential process where each state depends on the previous one and this creates a backward dependence chain that reaches back as far as the initial seeding state.

There are two basic solutions to this problem. One of them requires a lot of memory and the other one requires a lot of CPU time and both are actually more like workarounds than true solutions:

pregenerated PRN sequence: One thread generates sequentially a huge array of PRNs and then all threads access this array in a manner that would be consistent with the sequential case. This method requires lots of memory in order to store the sequence. Another option would be to have the sequence stored into a disk file that is later memory-mapped. The latter method has the advantage that it saves some compute time, but generally I/O operations are slow, so it only makes sense on machines with limited processing power or with small amounts of RAM.

prewound PRNGs: This one works well in cases when work is being statically distributed among the threads, e.g. with schedule(static). Each thread has its own PRNG and all PRNGs are seeded with the same initial seed. Then each thread draws as many dummy PRNs as its starting iteration, essentially prewinding its PRNG to the correct position. For example:

  • thread 0: draws 0 dummy PRNs, then draws 100 PRNs and fills out(0:99)
  • thread 1: draws 100 dummy PRNs, then draws 100 PRNs and fills out(100:199)
  • thread 2: draws 200 dummy PRNs, then draws 100 PRNs and fills out(200:299)

and so on. This method works well when each thread does a lot of computations besides drawing the PRNs since the time to prewind the PRNG could be substantial in some cases (e.g. with many iterations).

A third option exists for the case when there is a lot of data processing besides drawing a PRN. This one uses OpenMP ordered loops (note that the iteration chunk size is set to 1):

#pragma omp parallel for ordered schedule(static,1)
for (int i=0; i < n; i++) {
  #pragma omp ordered
  {
     rnum = R::rnorm(mu,sigma);
  }
  out(i) = lots of processing on rnum
}

Although loop ordering essentially serialises the computation, it still allows for lots of processing on rnum to execute in parallel and hence parallel speed-up would be observed. See this answer for a better explanation as to why so.

Community
  • 1
  • 1
Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
1

Yes, sourceCpp() etc and an instantiation of RNGScope so the RNGs are left in a proper state.

And yes one can do OpenMP. But inside of OpenMP segment you cannot control in which order the threads are executed -- so you longer the same sequence. I have the same problem with a package under development where I would like to have reproducible draws yet use OpenMP. But it seems you can't.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725