0

I would like to create a function with three parameters: the sample size, the number of samples, the true p of success in a Bernoulli trial.

This function would output the following results: the mean estimate of p (i.e. the average of the p estimates for each sample), the mean estimate of the true standard deviation (i.e. the average of the sd estimates for each sample) and finally the fraction of samples in which the the 95% CI contained the true p.

I come up with the following code:

   draw_function <- function(si=1, proba= 0.5){
   sample_draw <- rbinom(n= si, 1, prob= proba)
   mean_estimate <-  mean(sample_draw)
   s_estimate <- sqrt(mean_estimate*(1-mean_estimate))
  confidence_interval <- list(c(mean_estimate - 
  1.96*s_estimate/sqrt(si), mean_estimate + 1.96*s_estimate/sqrt(si)))
    list(mean_sample = mean_estimate, s_sample = s_estimate, c_sample = 
  confidence_interval)
   }



   draw_sample <- function(s=1, r=1, p=0.5) {
   for (i in 1:r) {
     inter <-  list(mean_p=c(), mean_s = c(), mean_c = c() )
     samp <- draw_function(si=s, proba= p)
    inter$mean_p = c(inter$mean_p, samp$mean_sample)
  inter$mean_s = c(inter$mean_s, samp$s_sample)
  inter$mean_c <- c(inter$mean_c, samp$c_sample)
    if ( inter[[3]][1] > p & inter[[3]][2] < p ) {
   mean_c <- (mean_c + 1)/i
   } 
  else {
  meanc <- (mean_c + 0)/i
  }
    }

  return(list(mean(inter$mean_p), mean(inter$mean_s), inter$mean_c))

However, this code doesn't work even after having done modified the mistakes that were kindly brought to my attention.

I keep getting this error:

Error in draw_sample(s = 30, r = 1000, p = 0.05) :
(list) object cannot be coerced to type 'double'

Hence, I would have liked to ask for your help to find the issue and build such a function! Thanks!

Rororo
  • 21
  • 5

1 Answers1

0

The error happens during the following section:

inter[[3]][1] > p & inter[[3]][2] < p

Using browser() or running the code line by line, you will notice that

confidence_interval <- list(c(mean_estimate - 1.96*s_estimate/sqrt(si), 
  mean_estimate + 1.96*s_estimate/sqrt(si)))

returns a list itself. So if you want to take the parts themself it will have to be inter[[3]][[i]][1] and inter[[3]][[i]][2] respectively.

A few optimizations are possible to, i will likely edit this comment lately with a few suggestions. :-)

:::Edit::: A slightly more in depth answer. The r[dens] functions in R has some overhead associated with them. As such an easy way of speeding up code, is to run all the simulations once, and perform the calculations on subsets in a clever way. One way (if the simulation is within memory limit) is shown below inserting everything into a matrix, and then using rowMeans to calculate the necessary information quickly.

draw_sample <- function(s = 1, r = 1, p = .5){
  samples <- matrix(rbinom(n = r * s, size = 1, p = p), ncol = s, nrow = r, byrow = TRUE)
  mean_p <- rowMeans(samples)
  mean_s <- mean_p * (1 - mean_p)
  bound = 1.96 * mean_s / sqrt(s)
  mean_c <- mean(ifelse(mean_p - bound < p & mean_p + bound > p, 1, 0))
  list(mean_p = mean(mean_p), mean_s = mean(mean_s), mean_c = mean_c)
}
Oliver
  • 8,169
  • 3
  • 15
  • 37
  • Infinitely many thanks! I should have used the matrix notation... I just have a quick question: why isn't the line mean_c <- mean(ifelse(mean_p - bound >p & mean_p + bound < p, 1, 0)), mean_c <- mean(ifelse(mean_p - bound > p & mean_p + bound < p, 1, 0)) instead with the double condition? the problem is that when I modify this line of code, it doesn't work anymore. – Rororo Jan 30 '19 at 05:12
  • Hello Rororo It is simply a blunder by me. Obviously the the code should be with the double condition. I can also see that i lacked the `size` argument in the `rbinom` call. I have added both now, and it runs when i try executing `draw_sample(s = 100, r = 100, p = .5)`. :-) – Oliver Jan 30 '19 at 19:57
  • Also, the ifelse statement was clearly reversed. We want the lower bound to be 'smaller' than the true p, and the upper bound to be 'greater' than the true p for the true p to be inbetween. This has now been fixed too in my answer. – Oliver Jan 30 '19 at 20:41