I'm struggling to create a vectorized functional solution that will allow me to replicate stratified random sampling without replacement over many iterations. I'm able to sample without replacement once, then remove those rows from the dataset and then repeat the process from the unsampled observations. Unfortunately I'm needing to do this many times which makes this manual option impossible.
I've tried using the replicate() function, however I'm only able to have it sample without replacement for each pass. It puts the chosen samples back into the dataset for the next sampling pull.
Using the code below, I'd like the function to create 30 new datasets composed of 3 unique (previously unsampled) rows each from the "one" and "zero" sets. So each new dataset would have 6 total observations (3-1's and 3-0's) and be named something unique (i.e. "new_dat1", "new_dat2"..."new_dat30").
If possible, I'm looking to achieve all of this without using for loops, so something in the "apply" family is preferred.
set.seed(123)
dat <- data.frame(Outcome = round(runif(160, 0, 1)))
cust <- data.frame(Cust = rep(c("ABC", "DEF", "GHI"), c(45, 80, 35)))
dat <- cbind(cust, dat)
one <- subset(dat, Outcome == 1)
zero <- subset(dat, Outcome == 0)
# Manual option which is not sufficient
################################################
# sample 1's and remove choosen obs from "one" dataset
set.seed(123)
index <- sample(1:nrow(one), 3, replace = FALSE)
new_dat1 <- one[index, ]
unused_one <- one[-index, ]
# sample 0's and remove choosen obs from "zero" dataset
set.seed(123)
index <- sample(1:nrow(zero), 3, replace = FALSE)
unused_zero <- zero[-index, ]
# combine the 3-1 and 3-0 samples into the first of 30 "new_datn" sets
new_dat1 <- rbind(new_dat1, zero[index, ])
# repeat, now sampling from "unused_one" and "unused_zero" to create "new_dat2" - "new_dat30"
################################################
# Failed attempt using the replicate() function
################################################
set.seed(123)
one_sample <- replicate(30, one[sample(nrow(one), 3, replace = FALSE), ], simplify = FALSE)
zero_sample <- replicate(30, zero[sample(nrow(zero), 3, replace = FALSE), ], simplify = FALSE)
Making this even more complicated is the fact that my total number of 0 and 1 observations in the "dat" set will vary from time to time so I'll likely always have remainders to deal with. So the function must be able to sample 3 for each "new_dat" until it runs into a remainder for the final set, which can go into the final "new_dat" regardless of the value.
Even if I could figure out how to solve the sampling issue in a vectorized function, I would really be at a loss to have the function create new datasets and name them appropriately.
I would be very grateful if anyone could provide me with some assistance. Thank you for taking the time to read through my post.