0

I have written the code below to obtain a bootstrap estimate of a mean. My objective is to view the numbers selected from the data set, ideally in the order they are selected, by the function boot in the boot package.

The data set only contains three numbers: 1, 10, and 100 and I am only using two bootstrap samples.

The estimated mean is 23.5 and the R code below indicates that the six numbers included one '1', four '10' and one '100'. However, there are 30 possible combinations of those numbers that would have resulted in a mean of 23.5.

Is there a way for me to determine which of those 30 possible combinations is the combination that actually appeared in the two bootstrap samples?

library(boot)

set.seed(1234)

dat <- c(1, 10, 100)
av  <- function(dat, i) { sum(dat[i])/length(dat[i]) }
av.boot <- boot(dat, av, R = 2)
av.boot
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = dat, statistic = av, R = 2)
#
#
# Bootstrap Statistics :
#     original  bias    std. error
# t1*       37   -13.5    19.09188
#

mean(dat) + -13.5 
# [1] 23.5

# The two samples must have contained one '1', four '10' and one '100',
# but there are 30 possibilities.
# Which of these 30 possible sequences actual occurred?

# This code shows there must have been one '1', four '10' and one '100'
# and shows the 30 possible combinations

my.combos <- expand.grid(V1  = c(1, 10, 100),
                         V2  = c(1, 10, 100),
                         V3  = c(1, 10, 100),
                         V4  = c(1, 10, 100),
                         V5  = c(1, 10, 100),
                         V6  = c(1, 10, 100))

my.means <- apply(my.combos, 1, function(x) {( (x[1] + x[2] + x[3])/3 + (x[4] + x[5] + x[6])/3 ) / 2 })

possible.samples <- my.combos[my.means == 23.5,]
dim(possible.samples)

n.1   <- rowSums(possible.samples == 1)
n.10  <- rowSums(possible.samples == 10)
n.100 <- rowSums(possible.samples == 100)

n.1[1]
n.10[1]
n.100[1]

length(unique(n.1))   == 1
length(unique(n.10))  == 1
length(unique(n.100)) == 1
Siguza
  • 21,155
  • 6
  • 52
  • 89
Mark Miller
  • 12,483
  • 23
  • 78
  • 132

1 Answers1

0

I think you can determine the numbers sampled and the order in which they are sampled with the code below. You have to extract the function ordinary.array from the boot package and paste that function into your R code. Then specify the values for n, R and strata, where n is the number of observations in the data set and R is the number of replicate samples you want.

I do not know how general this approach is, but it worked with a couple of simple examples I tried, including the example below.

library(boot)

set.seed(1234)

dat <- c(1, 10, 100, 1000)
av  <- function(dat, i) { sum(dat[i])/length(dat[i]) }
av.boot <- boot(dat, av, R = 3)
av.boot
#
# ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
# Call:
# boot(data = dat, statistic = av, R = 3)
#
#
# Bootstrap Statistics :
#     original  bias    std. error
# t1*   277.75  -127.5    132.2405
# 
# 

mean(dat) + -127.5
# [1] 150.25

# boot:::ordinary.array

ordinary.array <- function (n, R, strata) 
{
    inds <- as.integer(names(table(strata)))
    if (length(inds) == 1L) {
        output <- sample.int(n, n * R, replace = TRUE)
        dim(output) <- c(R, n)
    }
    else {
        output <- matrix(as.integer(0L), R, n)
        for (is in inds) {
            gp <- seq_len(n)[strata == is]
            output[, gp] <- if (length(gp) == 1) 
                rep(gp, R)
            else bsample(gp, R * length(gp))
        }
    }
    output
}

# I think the function ordinary.array determines which elements 
# of the data are sampled in each of the R samples

set.seed(1234)
ordinary.array(n=4,R=3,1)

#      [,1] [,2] [,3] [,4]
# [1,]    1    3    1    3
# [2,]    3    4    1    3
# [3,]    3    3    3    3
#
# which equals:

((1+100+1+100) / 4  +  (100+1000+1+100) / 4  +  (100+100+100+100) / 4) / 3

# [1] 150.25
Mark Miller
  • 12,483
  • 23
  • 78
  • 132