2

I am trying to obtain the bootstrapping SEs for regression coefficients. The using data looks like:

set.seed(1234)
df <- data.frame(y = rnorm(1:30), 
             fac1 = as.factor(sample(c("A","B","C","D","E"),30, replace = T)),
             fac2 = as.factor(sample(c("NY","NC","CA"),30,replace = T)),
             x = rnorm(1:30))

I am using boot package to perform the bootstrapping:

library(boot)
fun <- function(data, index){
    data <- df[index,]
    reg <- lm(y ~ fac1 + fac2 + x, data)
    coef(reg)
}
test.boot <- boot(df, fun, strata = df$fac1, 100)

However, R complains:

Error in boot(df, fun, strata = df$fac1, 100) : 
number of items to replace is not a multiple of replacement length

My situation is exactly the same as mentioned here. I understand the problem here is the insufficient observations in each groups. The strata option in boot package seems can only work for one factor variable. In my case, I should stratify the samples based on two factors: fac1 and fac2 (please let me know if my understanding is not correct here).

I find out that function stratified posted here can produce exact stratified samples as I need. The problem here is how can I implement the stratified function to the boot function and let the boot function works on the correct samples?

Currently, I am writing a for-loop myself to run the bootstrapping using correct stratified samples. But I still want to know whether I can incorporate the stratified function into boot? Any suggestions? Thank you!

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Chuan
  • 667
  • 8
  • 22
  • See `help("interaction")` for building a single factor from 2 factors. – lmo Aug 30 '16 at 18:22
  • Thanks @lmao. I think using "interaction" is definitely a good way to solve the above question. I am still curious about if I can incorporate other customized function, like `stratified` mentioned here, into `boot` since I may need extra features provided by `stratified` for more complex situations. Thanks! – Chuan Aug 30 '16 at 18:33

1 Answers1

2

After analyzing the boot package carefully, I think I find a solution to my question without modifying the original code of boot. Actually, boot provides a way to let user customize his sampling strategy. Check the sim = "parametric" and ran.gen options in help(boot).

So, as to my case, I can simply specify the ran.gen function to nest the stratified function and use it to regenerate samples for bootstrapping.

fun <- function(data){
            reg <- lm(y ~ fac1 + fac2 + x, data)
            coef(reg)}

rgen <- function(df,stratified){
        #code of stratified goes here and other specifications ... }

test.boot <- boot(df, fun, 1000, sim = "parametric", ran.gen = rgen)

Done!

Chuan
  • 667
  • 8
  • 22