1

Suppose you have a factor variable whose level labels come in pairs (such as 'a1' and 'a2', 'b1' and 'b2', etc.), and these pairs have unequal n-sizes.

x <- factor(c(rep("a1", 10), rep("a2", 15),rep("b1", 5), rep("b2", 30),rep("c1", 33), rep("c2", 22)))

> table(x)

a1 a2 b1 b2 c1 c2 
10 15  5 30 33 22 

But you wanted to randomly downsample the larger-sized level of each pair to equalize their n-sizes. Here's the desired outcome:

a1 a2 b1 b2 c1 c2 
10 10  5  5 22 22 

I have found that caret::downSample() can downsample to equalize all the levels of a factor:

x_ds <- caret::downSample(1:115, x)

table(x_ds$Class)

a1 a2 b1 b2 c1 c2 
 5  5  5  5  5  5 

And I have the notion to use split() in conjunction with downSample(), but I'm having trouble figuring out a way to split on the level pairs. How could this be done?

xilliam
  • 2,074
  • 2
  • 15
  • 27

0 Answers0