0

I need to bootstrap my "automated' lapply t.test function to calculate Bootstrap statistics (original, bias, and standard error). Here's the basic t.test code I've gotten so far (no bootstrapping):

# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)
color<-rep(c("red", "blue","green","yellow","purple"), 12)

df<-data.frame(val, distance, phase, color)

# run function to obtain t.tests
lapply(split(df, list(df$color, df$phase)), function(d) {
  tryCatch({ t.test(val ~ distance, var.equal=FALSE, data=d) },
       error = function(e) NA)
})

Which works great. However, I'm unsure how I could incorporate a bootstrap method into this apply function.

TheSciGuy
  • 1,154
  • 11
  • 22

1 Answers1

0

Maybe something like the following does what you want. Note that the return value is a list of lists of objects of class "htest" (which are lists) or NA.

boot_fun <- function(DF){
  n <- nrow(DF)
  i <- sample(n, n, TRUE)
  df <- DF[i, ]
  lapply(split(df, list(df$color, df$phase)), function(d) {
    tryCatch({ t.test(val ~ distance, var.equal=FALSE, data=d) },
             error = function(e) NA)
  })
}

set.seed(1234)
R <- 10
result <- lapply(seq_len(R), function(i) boot_fun(df))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I appreciate the help. Your answer randomizes the data and performs 10 `t.test`s, which results in a list consisting of the results from each run. What I would like is to somehow obtain the overall Bootstrap statistics as shown here: https://stats.idre.ucla.edu/r/faq/how-can-i-generate-bootstrap-statistics-in-r/ – TheSciGuy Mar 21 '19 at 15:56
  • But that is precisely what I said in my comment to question, `lapply(list, t.test)` returns a ***list*** of lists/htest, function `boot::boot` cannot cope with that. You must subdivide the problem into elementary problems, such as getting p-values, or CI's. – Rui Barradas Mar 21 '19 at 16:00
  • I can call the statistics from the list, but I'm curious how I would then calculate the Bootstrap statistics such as (original, bias, and std. error) – TheSciGuy Mar 21 '19 at 16:10
  • @NickDylla `original` is the statistic of the original data, `bias <- original - mean(bootstatistic)`, `stderr <- sd(bootstat)`. – Rui Barradas Mar 21 '19 at 18:09
  • So, for example, I'd just have to create an `apply` function to grab p-values from the list and then average them? I.e. `pvalue_bias <- original_pvalue - mean(boot_pvalues)`? The same for t-statistic? – TheSciGuy Mar 21 '19 at 18:14
  • @NickDylla Yes and no. Yes if you do it for each sublist of class `"htest"`. The code above runs `R` tests for subdf and it's those that are bootstrapped. – Rui Barradas Mar 21 '19 at 18:42