0

I am looking for Packages which can do multiclass oversampling, Undersampling or both techniques. I tried using ROSE package but it works only for binary class.

my target variable has 4 class and there % are. "0"-70% "1"-15% "2"-10% "3"-5% "4"-5%

danishxr
  • 69
  • 2
  • 12

3 Answers3

2

I believe you should be able to perform a downsample or upsample with more than two classes with Caret package.

If caret doesn't, perhaps the best is to just write a custom code to randomly sample equal numbers from your variable classess.

Generally, in practice downsample or upsample is for binary classifications. You may want to consider the one versus all approach. If you downsample then, you have to adjust back your probabilities, so they are not affected by various downsample rates between classes.

update-sample code:

y = c("A", "A","A", "B", "B", "C", "C", "C","C", "C", "C") 
x = c(1,2,1,2,3,4,5,4,5,6,7) 
data=cbind(y=y,x1=x)

fin=NULL
for (i in unique(y)) {
sub=subset(data, y==i)
sam=sub[sample(nrow(sub), 2), ]
fin=rbind(fin, sam)}

results:

y   x1

A   2
A   1
B   3
B   2
C   6
C   7

I have sampled 2 from each of the Ys in here- but instead of 2, you should put the number of the smallest class in your Y.

RomRom
  • 302
  • 1
  • 11
2

You can use the R UBL package. It has several implementations of techniques to oversample multiclass problens, e.g. ADASYN and other algorithms to deal with unbalanced classes.

Everton Reis
  • 422
  • 4
  • 16
1

You can try SMOTE. SMOTE over or under samples the data by generating the observations if needed.So, ,most of the times, smote out performs any other sampling technique. This is a snippet of code in python.In R,it is a little hard to equalize the level distribution of target variable using SMOTE, but can be done considering 2 classes at a time

from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=99, ratio = 1.0)
x_train, y_train = sm.fit_sample(X_var, target_class)
print(pandas.value_counts(y_train))#verify class distribution here

ratio is hyper parameter here.

Hope this helps.