I just learned how to do bootstrap in R, and I'm excited. I was playing with some data, and found that, doesn't matter how many bootstrap samples I take, the CIs seem to be always around the same. I believe that, the more samples, the more narrow should the CI be. Here's the code.
library(boot)
M.<-function(dados,i){
d<-dados[i,]
mean(d$queimadas)
}
bootmu<-boot(dados,statistic=M.,R=10000)
boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = bootmu)
Intervals :
Level Normal Basic
95% (18.36, 21.64 ) (18.37, 21.63 )
Level Percentile BCa
95% (18.37, 21.63 ) (18.37, 21.63 )
Calculations and Intervals on Original Scale
Warning message:
In boot.ci(bootmu) : bootstrap variances needed for studentized intervals
As one can see, I took 10000 samples. Now let's try with just 100.
bootmu<-boot(dados,statistic=M.,R=100)
boot.ci(bootmu)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100 bootstrap replicates
CALL :
boot.ci(boot.out = bootmu)
Intervals :
Level Normal Basic
95% (18.33, 21.45 ) (18.19, 21.61 )
Level Percentile BCa
95% (18.39, 21.81 ) (18.10, 21.10 )
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
Some percentile intervals may be unstable
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable
Warning messages:
1: In boot.ci(bootmu) :
bootstrap variances needed for studentized intervals
2: In norm.inter(t, adj.alpha) :
extreme order statistics used as endpoints
>
The sample size is many times lower, but the CIs are essentially the same. Why?
If anyone wants to replicate the exact same example, here's the data.
> dados
queimadas plantacoes
1 27 418
2 13 353
3 21 239
4 14 251
5 18 482
6 18 361
7 22 213
8 24 374
9 21 298
10 15 182
11 23 413
12 17 218
13 10 299
14 23 306
15 22 267
16 18 56
17 24 538
18 19 424
19 15 64
20 16 225
21 25 266
22 21 218
23 24 424
24 26 38
25 19 309
26 20 451
27 16 351
28 15 174
29 24 302
30 30 492