Simulating in R- how can I make this faster?

Question

I am simulating something like Jim Berger's applet.

The simulation works like this: I will generate a sample x of size n either from the null distribution N(0,1) or from the alternative distribution N(theta, 1). I will assume that the the prior probability of the null is some proportion prop (so the prior of the alternative is 1-prop) and that the distribution of theta in the alternative is N(0,2) (I could change all this parameters, but this is just to start).

I want to get a large number of pvalues arround a certain range (like 2000 pvalues between 0.049 and 0.05, in the simulation this would be equivalent to z stats arround 1.96 and 1.97) from the simulation scenario described above, and to see how many came from the null and how many came from the alternative.

So far I came up with a solution like this:

berger <- function(prop, n){
  z=0
  while(z<=1.96|z>=1.97){
    u <- runif(1)
    if(u<prop){
      H0 <- TRUE
      x<-rnorm(n, 0, 1)
    }else{
      H0 <- FALSE
      theta <- rnorm(1, 0, 2)
      x <- rnorm(n, theta, 1)
    }
    z <- sqrt(n)*abs(mean(x))
  }
  return(H0)
}

results<-replicate(2000, berger(0.1, 100))
sum(results)/length(results) ## approximately 25%

It takes about 3,5 minutes. Is it possible to speed this up? How? Every answer is welcome, including integration with C.

Update: Parallelization can speed that up a little bit. But, I have tried the same code in Julia, and it takes only 14 seconds without any parallelization (code below).

Update 2: With Rcpp and parallelization it is possible to reduce the simulation to 8 seconds. See the new answer.

function berger(prop, n)
       z = 0 
       h0 = 0
       while z<1.96 || z > 1.97

              u = rand()

              if u < prop
                     h0 = true;
                     x = randn(n)             
              else
                     h0 = false
                     theta = randn()*2
                     x = randn(n) + theta
              end

              z = sqrt(n)*abs(mean(x))
       end

       h0
end

results = [0]

for i in 1:2000
       push!(results, berger(0.1, 100))
end

sum(results)/length(results)

I don't really see why it is important to consider only `0.049 < p < 0.05`? I see it is mentioned in the paper, but it doesn't make sense to me, personally. — PascalVKooten, May 16 '14 at 19:08
@PascalvKooten you want to calculate the probability that H0 is true given p is approximately 0.05. Read Jim Berger's page for further detail. — Carlos Cinelli, May 16 '14 at 19:16

nograpes · Answer 1 · 2014-05-17T04:34:11.713

3

There might be ways to make this function a little faster (through parallelization for example), but you aren't going to get orders of magnitude difference (edit: in R). The key problem is that you are making roughly 400 million draws from the normal distribution.

This is a function that returns the average number of runs through the while your function takes:

f<-function(prop,n){
  i<-0
  z<-0
  while(z<=1.96|z>=1.97){
    i<-i+1
    u <- runif(1)
    if(u<prop){
      H0 <- TRUE
      x<-rnorm(n, 0, 1)
    }else{
      H0 <- FALSE
      theta <- rnorm(1, 0, 2)
      x <- rnorm(n, theta, 1)
    }
    z <- sqrt(n)*abs(mean(x))
  }
  return(i)
}

Now we can calculate how many times your function runs:

set.seed(1)
runs<-replicate(200,f(prop=0.1, n=100))
mean(runs) # 2034
sd(runs) # 2121

So, to calculate the number of draws from the normal distribution:

# number of replicates
# times normal distributions per replicate
# draws from each distribution
2000*mean(runs)*100
# 406,853,000 normal distribution draws

The rnorm function calls a compiled C function, and is likely to be near optimal speed. You can test the "lower bound" of making this many draws on your own machine:

system.time(rnorm(406853000))
# My machine:
#   user  system elapsed 
#  53.78    2.39   56.62

By comparison your function runs roughly four times slower:

system.time(replicate(2000,berger(prop=0.1,n=100)))
#    user  system elapsed 
#  210.40    0.03  211.12

So, your function really isn't that slow when you think about it, especially when you consider that there is overhead on each call to rnorm. If it is very critical that you improve the speed of this function, and you have a few cores, you can easily parallelize it in R:

library(parallel)
mclapply(1:2000,function(x) berger(prop=0.1,n=100))

Other than that, you could write a super-optimized function in C and save a few minutes, but it might not be worth it.

edited May 17 '14 at 04:34

answered May 16 '14 at 21:15

nograpes

18,623
1
44
67

Thanks, really interesting, I will try to parallelize and see how it goes! – Carlos Cinelli May 16 '14 at 21:42
Although, I ran that applet just now, and it runs really fast, which makes me think that your function does not accurately reflect what they did. I strongly suspect that they are only making 2000 samples from the normal distribution instead of 4 million. – nograpes May 16 '14 at 21:59
1

You can`t get 2000 p-values in 0.049- 0.05 range with just 2000 simulations, they have to run more than that. You probably ran with n=1 (the default) so it is much faster. With n=100 the applet takes about 1,5 minutes here! I have just tried the parallelization, and R took the same time! – Carlos Cinelli May 16 '14 at 23:34
You are correct about the simulations. Three questions about the parallelization: 1) Did you set the number of cores with `options(mc.cores=x)` where `x` is the number of cores on your machine? 2) What OS are you running? If it is Windows, you'll have to use `foreach` instead of `parallel` 3) Did you monitor the CPUs and make sure they were all active when you ran it? – nograpes May 17 '14 at 02:39
I am running on Mac OS X (Mountain Lion) with an Intel Core 2 Duo; I used the default option (2 cores), I was looking at the activity monitor at those two little cpu graphs: without `parallel` only one of the graphs shows activity, while with it both graphs showed full activity! – Carlos Cinelli May 17 '14 at 02:56
I have just tried some coding with Julia. I am not sure if I got it right, but if I did, it took only 15 seconds! – Carlos Cinelli May 17 '14 at 03:54
Julia in parallel: 7 seconds! – Carlos Cinelli May 17 '14 at 04:13
3

That's lovely that you are having success with Julia. To cross-check, I ran some code in C to generate 400 million numbers, and found that it only took 3 seconds. I doubt you could achieve these speeds in R. – nograpes May 17 '14 at 04:30
Yeah, but I really like R, it would be awesome to achieve this in it! – Carlos Cinelli May 17 '14 at 04:36
If you're comparing R to Julia, are you comparing apples to apples? It might be that Julia uses a faster, but less accurate normal RNG. OTOH, maybe that's all you need – hadley May 18 '14 at 02:51
@nograpes given that rnorm hits C pretty quickly, it's hard to believe that you coded up something equivalent that's 20x faster. – hadley May 18 '14 at 02:52
@hadley actually this was the first time I used Julia, so I am not sure how reliable it is; I ended up giving it a try because some friend said it was fast, and since I will use this code a lot, speeding it up would be useful - but if the reduction in time comes at expense of less accurate results, maybe it is not worth it, so I will definitely have to check this out! For now I still would like to stay only within R - I was just curious about Julia - so any time reduction strategies are welcome! – Carlos Cinelli May 18 '14 at 04:17
Perhaps what the applet does is simulate from chi-squared? – petrelharp May 25 '14 at 15:53

score 3 · Accepted Answer · answered May 26 '14 at 20:45

It is actually simple to speed this up using Rcpp. Combining Rcpp with parellelization, I was able to reduce the time to 8 seconds.

The .cpp file is something like this (using Rcpp "sugars" turns this task pretty easy - since this was the first time I used Rcpp, maybe this code is not optimal, but it did the job!):

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]


int RcppBerger(double prop, int n) {

  double z=0,theta=0, u=0;
  int h = 0;
  NumericVector x;
    while (z<1.96 || z > 1.97){
      u = R::runif(0, 1);
      if(u < prop){
        h = 1;
        x = rnorm(n);
        }else{
          h = 0;
          theta = R::rnorm(0, 2);
          x = rnorm(n, theta, 1);
          }
          z = sqrt(n)*mean(x);
          if(z<0){z = -1*z;};
    }
  return h;
}

Then, without parallelization, you can just use the sourceCpp function and the RcppBerger will be available in the workspace:

library(Rcpp)
sourceCpp("RcppBerger.cpp")
results<-replicate(2000, RcppBerger(0.1, 100))
sum(results)/length(results) ## approximately 25%

This already reduces the time from 3,5 minutes to arround 40 seconds. After that we can parallelize.

In Windows, this is a little tricky, it seems that you have to create a package first. But Rcpp provides a nice function to do that Rcpp.package.skeleton. Just put the source file in it and it will create all the necessary documents and folders:

Rcpp.package.skeleton("RcppBerger", cpp_files = "RcppBerger.cpp")

Then, after installing the package, you can parallelize with foreach and doParallel:

library(foreach)
library(doParallel)
library(RcppBerger)
registerDoParallel(cores=8)
results<- foreach(1:2000, .packages="RcppBerger") %dopar% RcppBerger(0.1, 100)

Now the simulation takes only 8 seconds.

Nice write-up, and well done with passage from `sourceCpp()` to a package. You could now try a faster N(0,1) generator: either switch from R's default to a faster one, or plug in a faster RNG as eg one from RcppZiggurat. — Dirk Eddelbuettel, May 26 '14 at 20:53
There is a chart in the RcppZiggurat vignette -- 'Ahrens-Dieter' and 'Kinderman-Ramage' are both already in R and faster than the default inversion by maybe a factor of two. RcppZiggurat can another factor of two. Let's see if you can bring this down from 8 seconds to 5 seconds :) — Dirk Eddelbuettel, May 26 '14 at 22:04
@CarlosCinelli Very nice. I was inspired by this question to learn more about varieties of N(0,1) generators. I was going to update my answer, but now that I see your answer I am glad I waited. Be careful with the `RZiggurat` implementation; it didn't work for me on a Linux box. Also, I think you should probably switch the check mark to this answer, it is complete and more helpful than my answer. — nograpes, May 28 '14 at 16:19
Thanks, @nograpes , I think it's time for me to start studying more numerical methods and RNG! — Carlos Cinelli, May 29 '14 at 01:11

Simulating in R- how can I make this faster?

2 Answers2