0

I have written a stochastic process simulator but I would like to speed it up since it's pretty slow.

The main part of the simulator is made of a for loop which I would like to re-write as a foreach with `%dopar%.

I have tried doing so with a simplified loop but I'm running into some problems. Suppose my for loop looks like this

library(foreach)

r=0
t<-rep(0,500)
for(n in 1:500){
    s<-1/2+r
    u<-runif(1, min = 0, max = 1)
    if(u<s){
        t[n]<-u
        r<-r+0.001
    }else{r<-r-0.001}
}

which means that at each iteration I update the value of r and s and, in one of the two outcomes, populate my vector t. I have tried several different ways of re-writing it as a foreach loop but it seems like with each iteration my values don't get updated and I get some pretty strange results. I have tried using return but it doesn't seem to work!

This is an example of what I have come up with.

rr=0
tt<-foreach(i=1:500, .combine=c) %dopar% {
    ss<-1/2+rr
    uu<-runif(1, min = 0, max = 1)
    if(uu<=ss){
        return(uu)
        rr<-rr+0.001
    }else{
        return(0)
        rr<-rr-0.001}
}

If it is impossible to use foreach what other way is there for me to re-write the loop so to be able to use all cores and speed up things?

smci
  • 32,567
  • 20
  • 113
  • 146
g_puffo
  • 613
  • 3
  • 11
  • 21
  • If your loop does -indeed- contain only arithmetic and distributional stuff (i.e. things that R's C API offers), I guess you could gain significant efficiency by re-writing your loop in C. This loop, as is, could, just, be a copy-paste in C code with some extra API lines before and after. (If your example is not sufficiently representative of your actual loop, perhaps you could edit, so that someone can either help vectorizing (at least part of the code) or, even, help with the C code?) – alexis_laz Nov 12 '14 at 00:31
  • 1
    Moving the runif(1,0,1) from inside to outside the loop as runif(500,0,1) and then indexing u (i.e. t[n] <- u[n]) cut execution to 1/3 of the time. Other than that, it depends on your exact code whether you can bring other parts out. Then Rcpp or the like will be necessary. – ARobertson Nov 12 '14 at 02:11
  • @alexis_laz: thank you for this idea. I will look into re-writing it in C but it might take me some time since I know nothing of C. However, I understand that several things that R does are actually based on C so it's definitely something I want to learn! – g_puffo Nov 12 '14 at 03:12
  • @ARobertson: that's a very interesting result you got there! I will implement it right away in my algo. Also, I'm still working on your feedback on my other question about vectorizing my function: I will comment back as soon as I'm finished with it. – g_puffo Nov 12 '14 at 03:16

2 Answers2

5

Since your comments, about turning to C, were encouraging and -mostly- to prove that this isn't a hard task (especially for such operations) and it's worth looking into, here is a comparison of two sample functions that accept a number of iterations and perform the steps of your loop:

ffR = function(n) 
{
   r = 0
   t = rep(0, n)
   for(i in 1:n) {
       s = 1/2 + r
       u = runif(1)
       if(u < s) {
           t[i] = u
           r = r + 0.001
       } else r = r - 0.001
   }

   return(t)
}


ffC = inline::cfunction(sig = c(R_n = "integer"), body = '
    int n = INTEGER(AS_INTEGER(R_n))[0];

    SEXP ans;
    PROTECT(ans = allocVector(REALSXP, n));

    double r = 0.0, s, u, *pans = REAL(ans);

    GetRNGstate();

    for(int i = 0; i < n; i++) {
        s = 0.5 + r;
        u = runif(0.0, 1.0);

        if(u < s) {
            pans[i] = u;
            r += 0.001;
        } else {
            pans[i] = 0.0;
            r -= 0.001;
        }
    }

    PutRNGstate();

    UNPROTECT(1);
    return(ans);
', includes = "#include <Rmath.h>")

A comparison of results:

set.seed(007); ffR(5)
#[1] 0.00000000 0.39774545 0.11569778 0.06974868 0.24374939
set.seed(007); ffC(5)
#[1] 0.00000000 0.39774545 0.11569778 0.06974868 0.24374939

A comparison of speed:

microbenchmark::microbenchmark(ffR(1e5), ffC(1e5), times = 20)
#Unit: milliseconds
#       expr        min         lq     median         uq        max neval
# ffR(1e+05) 497.524808 519.692781 537.427332 668.875402 692.598785    20
# ffC(1e+05)   2.916289   3.019473   3.133967   3.445257   4.076541    20

And for the sake of completeness:

set.seed(101); ans1 = ffR(1e5)
set.seed(101); ans2 = ffC(1e5)
all.equal(ans1, ans2)
#[1] TRUE

Hope any of this could be helpful in some way.

alexis_laz
  • 12,884
  • 4
  • 27
  • 37
  • thank you for this template: having something to work with rather than starting from scratch will be very helpful! You've definitely been very helpful! – g_puffo Nov 12 '14 at 19:48
1

What you are trying to do, since every iteration is dependent on the previous steps of the loop, doesn't seem to be parallelizable. You are updating the variable r and expecting other branches that are running simultaneously to know about it, and in fact wait for the update to happen, which
1) Doesn't happen. They won't wait, they'll just take r's current value whatever that is at the time they are running
2) If it did it would be same as running it without %dopar%

OganM
  • 2,543
  • 16
  • 33