Randomly split sample into two groups with proportional representation on 2+ variables

Question

I'd like to divide a sample into two groups such that there is proportional representation of 2 or more variables in those two groups. For instance, in the mtcars dataset, here are the proportions of the last 3 variables in the data.frame:

> data(mtcars)
> round(prop.table(table(mtcars$carb)),2)

   1    2    3    4    6    8 
0.22 0.31 0.09 0.31 0.03 0.03 
> round(prop.table(table(mtcars$gear)),2)

   3    4    5 
0.47 0.38 0.16 
> round(prop.table(table(mtcars$am)),2)

   0    1 
0.59 0.41

In this example, I'd like to divide the sample into two groups such that there is something close to a 60/40 split on am with splits similar on the other two variables to their representation in the dataset.

The closest thing I know how to do is to draw a matched sample, like in a treatment study, but in that case the two groups are already defined based on some treatment variable, and you're simply matching a control unit to a treatment unit such that the proportions are similar to each other on 1+ covariates. This is a little different, and while I feel like there must be a similar method to use, I can't wrap my head around it. Is there an efficient way to do this? Or is there a totally different way I should be thinking about this?

How can I use the `prob` argument in `sample` when there is more than one variable giving proportions that I'm trying to match? — Jon, Apr 20 '23 at 15:36
E.g. taking those from `proportions(table(mtcars[c("carb", "gear", "am")]))`. — GKi, Apr 20 '23 at 19:31
Oh I see, so working with the joint probabilities over the multiple variables. I think that would work OK for 2-3 variables, but do you know of a way that can approximate the marginal proportions of the different variables without modeling the joint probabilities? — Jon, Apr 21 '23 at 17:20
Maybe multiplying them like `outer(proportions(table(mtcars$carb)), proportions(table(mtcars$gear)))` ? — GKi, Apr 24 '23 at 07:21

Randomly split sample into two groups with proportional representation on 2+ variables

0 Answers0