0

Consider the following data named as df:

df <- data.frame(id1 = c(1,1,1,2,2,2,3,3,3,3,3,3),
                 id2 = c('a','a','a','b','b','b','c','c','c','d','d','d'),
                   y = c(3,5,8,5,8,5,1,4,5,4,4,7),
                   x = c(.2,.3,.1,2,.2,.5,1,1.5,1.2,.1,1,.2))
> df
   id1 id2 y   x
1    1   a 3 0.2
2    1   a 5 0.3
3    1   a 8 0.1
4    2   b 5 2.0
5    2   b 8 0.2
6    2   b 5 0.5
7    3   c 1 1.0
8    3   c 4 1.5
9    3   c 5 1.2
10   3   d 4 0.1
11   3   d 4 1.0
12   3   d 7 0.2

My objective is to resample clusters (id1) by maintaining the association with id2. For example, for id1 = 3, the code should resample for id2 = c and id2 = d separately. There is no such problem for id1 = 1 and id1 = 2.

What I've tried is the following:

library(boot)
cluster <- unique(df$id1)
set.seed(565)
sample_cluster <- sample(unique(cluster), replace=T) #here is my problem

Thank you!

iGada
  • 599
  • 3
  • 9

1 Answers1

1

Suppose we apply a simple function:

func = function(da,ii)colMeans(da[,c("x","y")]))

You can provide a factor vector that is combination of your 2 ids, to the argument strata = :

library(boot)
boot(df,statistic=func,R=99,strata = factor(paste(df$id1,df$id2)))

STRATIFIED BOOTSTRAP


Call:
boot(data = df, statistic = func, R = 99, strata = factor(paste(df$id1, 
    df$id2)))


Bootstrap Statistics :
     original  bias    std. error
t1* 0.6916667       0           0
t2* 4.9166667       0           0
StupidWolf
  • 45,075
  • 17
  • 40
  • 72