I am trying to obtain the bootstrapping SEs for regression coefficients. The using data looks like:
set.seed(1234)
df <- data.frame(y = rnorm(1:30),
fac1 = as.factor(sample(c("A","B","C","D","E"),30, replace = T)),
fac2 = as.factor(sample(c("NY","NC","CA"),30,replace = T)),
x = rnorm(1:30))
I am using boot
package to perform the bootstrapping:
library(boot)
fun <- function(data, index){
data <- df[index,]
reg <- lm(y ~ fac1 + fac2 + x, data)
coef(reg)
}
test.boot <- boot(df, fun, strata = df$fac1, 100)
However, R complains:
Error in boot(df, fun, strata = df$fac1, 100) :
number of items to replace is not a multiple of replacement length
My situation is exactly the same as mentioned here. I understand the problem here is the insufficient observations in each groups. The strata
option in boot
package seems can only work for one factor variable. In my case, I should stratify the samples based on two factors: fac1
and fac2
(please let me know if my understanding is not correct here).
I find out that function stratified
posted here can produce exact stratified samples as I need. The problem here is how can I implement the stratified
function to the boot
function and let the boot
function works on the correct samples?
Currently, I am writing a for-loop
myself to run the bootstrapping using correct stratified samples. But I still want to know whether I can incorporate the stratified
function into boot
? Any suggestions? Thank you!