2

I have a problem of imbalanced classes and small dataset :

0 : 142

1 : 29

I try to find the right method to deal with this issue and the best algorithm.

For now the best results I have came from using a combination of oversampling with SMOTE and undersampling with RandomUnderSampler. And then using ClassificationTree from interpretML.

I achieve a score of 0.88 and a not too bad confusion matrix but I need better results.

    0  1
 0 27  3
 1 4   23

I need to improve the score and to have better predictions

Here is my code :

oversample = SMOTE()
X_over, y_over = oversample.fit_resample(X, y)
under = RandomUnderSampler()
X_ovun, y_ovun=under.fit_resample(X_over, y_over)

seed = 1
X_train, X_test, y_train, y_test = train_test_split(X_ovun, y_ovun, test_size=0.20, random_state=seed)

ct = ClassificationTree(random_state=seed)
ct.fit(X_train, y_train)

ct.score(X_test, y_test)

Any advice to improve the results will be welcomed !

DuneC
  • 21
  • 1

0 Answers0