Consider the following data named as df
:
df <- data.frame(id1 = c(1,1,1,2,2,2,3,3,3,3,3,3),
id2 = c('a','a','a','b','b','b','c','c','c','d','d','d'),
y = c(3,5,8,5,8,5,1,4,5,4,4,7),
x = c(.2,.3,.1,2,.2,.5,1,1.5,1.2,.1,1,.2))
> df
id1 id2 y x
1 1 a 3 0.2
2 1 a 5 0.3
3 1 a 8 0.1
4 2 b 5 2.0
5 2 b 8 0.2
6 2 b 5 0.5
7 3 c 1 1.0
8 3 c 4 1.5
9 3 c 5 1.2
10 3 d 4 0.1
11 3 d 4 1.0
12 3 d 7 0.2
My objective is to resample clusters (id1
) by maintaining the association with id2
. For example, for id1 = 3
, the code should resample for id2 = c
and id2 = d
separately. There is no such problem for id1 = 1
and id1 = 2
.
What I've tried is the following:
library(boot)
cluster <- unique(df$id1)
set.seed(565)
sample_cluster <- sample(unique(cluster), replace=T) #here is my problem
Thank you!