0

I have a stochastic simulation model that produces random deviates of a variable, whose expected value is unknown. I would like to determine the minimal number of simulations necessary to obtain convergence of the mean of the random variable.

For instance, using a reproducible example:

set.seed(1)    
sample_size <- 10000
X <- runif(sample_size)
plot(sapply(seq_len(sample_size),
            function(i) mean(X[seq_len(i)])),
     type = "l",
     ylim = c(0, 1),
     xlab = "Number of samples, n",
     ylab = "Average of n samples")

enter image description here

Here, I would like to determine the minimal sample_size to obtain convergence of the mean of X (here probably somewhere between 2000 and 10000), while the expected value of X is unknown (for the reproducible example I know that the expected value is 0.5, but let's pretend we ignore that).

Any advice on the method I should use?

1 Answers1

1

You can calculate the rolling coefficient of variation varcoeff_x and pick the smallest sample size in which varcoeff_x is e.g. less than 1%, resulting in a min. sample size of 2502:

library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(tidyverse)

set.seed(1)
sample_size <- 10000
X <- runif(sample_size)

tibble(
  x = sort(X),
  varcoeff_x = rollapply(sort(X), width = 100, FUN = function(x) sd(x) / mean(x), na.pad = TRUE)
) %>%
  mutate(sample_size = row_number()) %>%
  filter(varcoeff_x < 0.01) %>%
  pull(sample_size) %>%
  min()
#> Warning in rollapply.zoo(zoo(data), ...): na.pad argument is deprecated
#> [1] 2502

The rolling widow has a width of 100 elements to smooth out local bumps.

danlooo
  • 10,067
  • 2
  • 8
  • 22