Bootstrap t.test: Using apply function for multiple grouping levels

Question

I need to bootstrap my "automated' lapply t.test function to calculate Bootstrap statistics (original, bias, and standard error). Here's the basic t.test code I've gotten so far (no bootstrapping):

# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)
color<-rep(c("red", "blue","green","yellow","purple"), 12)

df<-data.frame(val, distance, phase, color)

# run function to obtain t.tests
lapply(split(df, list(df$color, df$phase)), function(d) {
  tryCatch({ t.test(val ~ distance, var.equal=FALSE, data=d) },
       error = function(e) NA)
})

Which works great. However, I'm unsure how I could incorporate a bootstrap method into this apply function.

The problem is that function `boot::boot` fills a *matrix* with the bootstrap values and you are running many tests, the result is a *list*. A way to compute the values you want could be to repeat the tests 3 times. — Rui Barradas, Mar 21 '19 at 15:19
Do you really want to bootstrap, as in sample with replacement? because the group sizes in this example are prohibitively small — Nate, Mar 21 '19 at 15:24
Run it 3 times total (1 for each statistic)? That'd still be fantastic — TheSciGuy, Mar 21 '19 at 15:24
My real data is actually a data frame consisting of >21k observations — TheSciGuy, Mar 21 '19 at 15:26

Rui Barradas · Accepted Answer · 2019-03-21T15:46:05.997

0

Maybe something like the following does what you want. Note that the return value is a list of lists of objects of class "htest" (which are lists) or NA.

boot_fun <- function(DF){
  n <- nrow(DF)
  i <- sample(n, n, TRUE)
  df <- DF[i, ]
  lapply(split(df, list(df$color, df$phase)), function(d) {
    tryCatch({ t.test(val ~ distance, var.equal=FALSE, data=d) },
             error = function(e) NA)
  })
}

set.seed(1234)
R <- 10
result <- lapply(seq_len(R), function(i) boot_fun(df))

edited Mar 21 '19 at 15:46

answered Mar 21 '19 at 15:35

Rui Barradas

70,273
8
34
66

I appreciate the help. Your answer randomizes the data and performs 10 `t.test`s, which results in a list consisting of the results from each run. What I would like is to somehow obtain the overall Bootstrap statistics as shown here: https://stats.idre.ucla.edu/r/faq/how-can-i-generate-bootstrap-statistics-in-r/ – TheSciGuy Mar 21 '19 at 15:56
But that is precisely what I said in my comment to question, `lapply(list, t.test)` returns a ***list*** of lists/htest, function `boot::boot` cannot cope with that. You must subdivide the problem into elementary problems, such as getting p-values, or CI's. – Rui Barradas Mar 21 '19 at 16:00
I can call the statistics from the list, but I'm curious how I would then calculate the Bootstrap statistics such as (original, bias, and std. error) – TheSciGuy Mar 21 '19 at 16:10
@NickDylla `original` is the statistic of the original data, `bias <- original - mean(bootstatistic)`, `stderr <- sd(bootstat)`. – Rui Barradas Mar 21 '19 at 18:09
So, for example, I'd just have to create an `apply` function to grab p-values from the list and then average them? I.e. `pvalue_bias <- original_pvalue - mean(boot_pvalues)`? The same for t-statistic? – TheSciGuy Mar 21 '19 at 18:14
@NickDylla Yes and no. Yes if you do it for each sublist of class `"htest"`. The code above runs `R` tests for subdf and it's those that are bootstrapped. – Rui Barradas Mar 21 '19 at 18:42

Bootstrap t.test: Using apply function for multiple grouping levels

1 Answers1