Suppose you have a factor variable whose level labels come in pairs (such as 'a1' and 'a2', 'b1' and 'b2', etc.), and these pairs have unequal n-sizes.
x <- factor(c(rep("a1", 10), rep("a2", 15),rep("b1", 5), rep("b2", 30),rep("c1", 33), rep("c2", 22)))
> table(x)
a1 a2 b1 b2 c1 c2
10 15 5 30 33 22
But you wanted to randomly downsample the larger-sized level of each pair to equalize their n-sizes. Here's the desired outcome:
a1 a2 b1 b2 c1 c2
10 10 5 5 22 22
I have found that caret::downSample()
can downsample to equalize all the levels of
a factor:
x_ds <- caret::downSample(1:115, x)
table(x_ds$Class)
a1 a2 b1 b2 c1 c2
5 5 5 5 5 5
And I have the notion to use split()
in conjunction with downSample()
, but I'm having trouble figuring out a way to split on the level pairs. How could this be done?