I am simulating something like Jim Berger's applet.
The simulation works like this: I will generate a sample x
of size n
either from the null distribution N(0,1) or from the alternative distribution N(theta, 1). I will assume that the the prior probability of the null is some proportion prop
(so the prior of the alternative is 1-prop
) and that the distribution of theta
in the alternative is N(0,2) (I could change all this parameters, but this is just to start).
I want to get a large number of pvalues arround a certain range (like 2000 pvalues between 0.049 and 0.05, in the simulation this would be equivalent to z stats arround 1.96 and 1.97) from the simulation scenario described above, and to see how many came from the null and how many came from the alternative.
So far I came up with a solution like this:
berger <- function(prop, n){
z=0
while(z<=1.96|z>=1.97){
u <- runif(1)
if(u<prop){
H0 <- TRUE
x<-rnorm(n, 0, 1)
}else{
H0 <- FALSE
theta <- rnorm(1, 0, 2)
x <- rnorm(n, theta, 1)
}
z <- sqrt(n)*abs(mean(x))
}
return(H0)
}
results<-replicate(2000, berger(0.1, 100))
sum(results)/length(results) ## approximately 25%
It takes about 3,5 minutes. Is it possible to speed this up? How? Every answer is welcome, including integration with C.
Update: Parallelization can speed that up a little bit. But, I have tried the same code in Julia, and it takes only 14 seconds without any parallelization (code below).
Update 2: With Rcpp and parallelization it is possible to reduce the simulation to 8 seconds. See the new answer.
function berger(prop, n)
z = 0
h0 = 0
while z<1.96 || z > 1.97
u = rand()
if u < prop
h0 = true;
x = randn(n)
else
h0 = false
theta = randn()*2
x = randn(n) + theta
end
z = sqrt(n)*abs(mean(x))
end
h0
end
results = [0]
for i in 1:2000
push!(results, berger(0.1, 100))
end
sum(results)/length(results)