I want to take one random Site for every Region, create a new data frame, and repeat these processes until all Site are sampled. So, each data frame will NOT contain the same Site from the same Region.
A few Regions in my real data frame have more Sites (Region C has 4 Sites) than the other Regions. I want remove those rows (perhaps I should do this before making multiple data frames).
Here is an example data frame (real one has >100 Regions and >10 Sites per Region):
mydf <- read.table(header = TRUE, text = 'V1 V2 Region Site
5 1 A X1
5 6 A X2
8 9 A X3
2 3 B X1
3 1 B X2
7 8 B X3
1 2 C X1
9 4 C X2
4 5 C X3
6 7 C X4')
Repeating the following code for three times produces data frames that contains the same Sites for a given Region (The second and third tables both has Site X2 for Region A).
do.call(rbind, lapply(split(mydf, mydf$Region), function(x) x[sample(nrow(x), 1), ]))
V1 V2 Region Site
A 8 9 A X3
B 2 3 B X1
C 6 7 C X4
V1 V2 Region Site
A 5 6 A X2
B 7 8 B X3
C 9 4 C X2
V1 V2 Region Site
A 5 6 A X2
B 3 1 B X2
C 6 7 C X4
Could you please help me create multiple data frames so that all data frames contain all Regions, but each data frame contains unique Region-Site combination.
EDIT: Here are expected output. To produce these, in the first sampling, draw one Site (row) randomly from every Region and make a data frame. In the second sampling, repeat the same process but the same Site for a given Region cannot be drawn. What I want is independent data frames that contain unique combination of Region-Site.
V1 V2 Region Site
5 1 A X1
7 8 B X3
1 2 C X1
V1 V2 Region Site
5 6 A X2
3 1 B X2
4 5 C X3
V1 V2 Region Site
8 9 A X3
2 3 B X1
9 4 C X2