Bootstraoping using the boot package

Question

Trying around with the boot package in R and it does not accomplish what I want, it just returns the same value. I have read the documentation and checked the first example which seems to be the same, but I don't get any bootstrapped results, just the original value is returned. My example:

dat      <- data.frame(A = rnorm(100), B = runif(100))

booty    <- function(x, ind) sum(x$A)/sum(x$B)

boot_out <- boot(dat, booty, R  = 50, stype = "w")

Try `sum(x$A[ind])/sum(x$B[ind])`. You must use the index to subset your dataset in your bootstrap function. — Rui Barradas, Sep 26 '17 at 12:59
@RuiBarradas Thanks. Could you post your solution so I can accept it? — MLEN, Sep 26 '17 at 14:40

score 0 · Accepted Answer · answered Sep 26 '17 at 14:53

Like I've said in the comment, you must use the index, in this case ind, to subset the dataset passed to the bootstrap statistic function, sum(x$A[ind])/sum(x$B[ind]).
The full function would then become

booty    <- function(x, ind) sum(x$A[ind])/sum(x$B[ind])

Two notes.
One, the stype = "i" argument states that the second argument to booty is a vector of indices, which, I believe, is what you want, not a vector of weights, stype = "w".
Also, in order to have reproducible results, the RNG seed should be set before calling boot or RNG functions. Something like the following.

set.seed(4237)

dat      <- data.frame(A = rnorm(100), B = runif(100))

boot_out <- boot(dat, booty, R  = 50, stype = "i")

score 0 · Answer 2 · answered Aug 19 '22 at 15:38

Just to add on to Rui's answer, it is possible to to use stype = "w" to perform a standard bootstrap. It's maybe easier to think about what doing stype = "f" does first.

Setting stype = "f" first resamples the indices of the original dataset (just like the usual bootstrap with stype = "i"), then tallies up how many times each unit is selected, and turns that tally into a frequency weight. For example, if a unit was sampled twice in the bootstrap sample, that unit would get a weight of 2. A unit not sampled would get a weight of 0. You can use these weights to calculate weighted statistics, which are then the output of the bootstrap procedure. These give equivalent results to subsetting the data based on the sampled indices.

So, adapting your code, using stype = "f" would look like the following:

at      <- data.frame(A = rnorm(100), B = runif(100))

booty    <- function(x, w) weighted.mean(x$A, w)/weighted.mean(x$B, w)

boot_out <- boot(dat, booty, R  = 50, stype = "f")

We have to adjust the statistic function (here called booty) to accept weights and then use those weights in computing the statistic of interest. If you were running a regression in each sample, you would supply the weights to the weights argument of the regression function.

Setting stype = "w" is identical to using stype = "f" except that the weights are divided by the sample size. This is just a convenience, so that, e.g., weighted means can be represented as weighted sums instead. This is exactly what is done in the documentation example for boot::boot().

Bootstraoping using the boot package

2 Answers2