I'd like to divide a sample into two groups such that there is proportional representation of 2 or more variables in those two groups. For instance, in the mtcars
dataset, here are the proportions of the last 3 variables in the data.frame:
> data(mtcars)
> round(prop.table(table(mtcars$carb)),2)
1 2 3 4 6 8
0.22 0.31 0.09 0.31 0.03 0.03
> round(prop.table(table(mtcars$gear)),2)
3 4 5
0.47 0.38 0.16
> round(prop.table(table(mtcars$am)),2)
0 1
0.59 0.41
In this example, I'd like to divide the sample into two groups such that there is something close to a 60/40 split on am
with splits similar on the other two variables to their representation in the dataset.
The closest thing I know how to do is to draw a matched sample, like in a treatment study, but in that case the two groups are already defined based on some treatment variable, and you're simply matching a control unit to a treatment unit such that the proportions are similar to each other on 1+ covariates. This is a little different, and while I feel like there must be a similar method to use, I can't wrap my head around it. Is there an efficient way to do this? Or is there a totally different way I should be thinking about this?