Bootstrap (via paramtest) in r: how to construct statistic based on a larger or smaller sample than the dataset used?

Question

Context: I'm trying to do a fairly complicated power calculation in advance of an experiment involving a blocked treatment assignment. I have data from a previous run of the experiment. I'm thinking:

Sample from the prior data with replacement to generate SAMPLESIZE observations
Blocked treatment assignment as in experimental protocol
Generate predicted outcomes assuming a particular EFFECTSIZE
Generate estimate, test-statistic, significance

Repeat steps 1-4 5000 times to estimate experimental power, with grid search over values of SAMPLESIZE and EFFECTSIZE, plot

I'm following the vignettes for doing this with paramtest setting boot=TRUE; their minimal example is:

  t_func_boot <- function(data, indices) {
    sample_data <- data[indices, ]
    treatGroup <- sample_data[sample_data$group == 'trt2', 'weight']
    ctrlGroup <- sample_data[sample_data$group == 'ctrl', 'weight']
    t <- t.test(treatGroup, ctrlGroup, var.equal=TRUE)
    stat <- t$statistic
    p <- t$p.value
    return(c(t=stat, p=p, sig=(p < .05)))
  }

    power_ttest_boot <- run_test(t_func_boot, n.iter=5000, output='data.frame', boot=TRUE,
 bootParams=list(data=PlantGrowth))
    results(power_ttest_boot) %>%
      summarise(power=mean(sig))

However, I cannot figure out how to adapt this to resample so that each replication has a larger (or smaller) sample size than data being sampled from.

score -1 · Answer 1 · answered Mar 21 '19 at 10:24

-1

t_func_boot <- function(data, indices) {
    sample_data <- data[indices, ]
    N <- calculate_the_N_somehow()
    subsample <- sample(nrow(sample_data), N, replace = TRUE)
    sample_data <- sample_data[subsample, ]
    # ...
  }

answered Mar 21 '19 at 10:24

dash2

2,024
6
15

Please explain your code ([here's why](https://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers)) – Nino Filiu Mar 21 '19 at 12:22
This could work (thinking about it), but it seems like having the sampling occur *within* the function rather than within the run_test/boot could cause issues. – daaronr Mar 21 '19 at 13:37

Bootstrap (via paramtest) in r: how to construct statistic based on a *larger* or *smaller* sample than the dataset used?

1 Answers1

Bootstrap (via paramtest) in r: how to construct statistic based on a larger or smaller sample than the dataset used?