0

The boot function doesn't seem to work for 66k resamples. The documentation doesn't seem to have any limits for the number of resamples and it works fine with a smaller number of resamples.

Here's some sample code:

library(boot)
library(data.table)

mean_i <- function(x, i) {
  mean(x[i])
}

set.seed(1)
x <- data.table(yr = sample(1:5, 66000, replace = TRUE),
                amount = sample(-100000:100000, 66000, replace = TRUE))

boot(x[, amount], statistic = mean_i, R = length(x[, amount]))

This gives the error:

Error in sample.int(n, n * R, replace = TRUE) : vector size cannot be NA

but

boot(x[1:100, amount], statistic = mean_i, R = length(x[1:100, amount]))

works fine.

Does anyone know if there's a maximum number for resamples or what could be causing the error?

happyspace
  • 113
  • 1
  • 2
  • 12
  • It looks like `n * R` in your case is 66000*66000, which is an enormously large vector. Do you have enough memory to hold 4.3 billion integers? – Brigadeiro Aug 06 '19 at 05:41
  • 1
    It may be integer overflow. If both `n` and `R` are integers( which is most likely the case), their product is too long to be stored as an integer( See `.Machine$integer.max`) . Plus, as @AdamK mentioned it would be nearly impossible to store a vector of that size in memory. – Rohit Aug 06 '19 at 06:03
  • A workaround is to do the bootstrap manually: `replicate(length(x[, amount]), {mean(sample(x[, amount], replace = TRUE))})`. This may take minutes to finish. – mt1022 Aug 06 '19 at 07:08

0 Answers0