2

I have unbalanced classes of records and the data is like the following:

X   Y   Z            Class
1   4   Good           A
3   5   Very Good      A
7   6   Good           A
8   7   Excellent      A
4   8   Pass           A
3   7   Good           A
34  6   Good           A
1   5   Very Good      A
4   3   Excellent      B
4   4   Excellent      B

I want to predict Class:

  1. what is the best data mining techniques?
  2. I used the decision tree but unfortunately I faced a problem of unbalanced record and I wasn't able to classify the data
mathielo
  • 6,725
  • 7
  • 50
  • 63

1 Answers1

1

I'd recommend looking into SMOTE (synthetic minority oversampling technique). This technique randomly selects, with replacement, from the set of minority instances within your training dataset. These selected instances are then added as duplicates to the training dataset resulting in more balanced classes and thereby preventing the classifier from learning to only predict the majority class.

Depending on the software or module you are using, and whether or not you need to use decision trees specifically, there may be other options. For instance, SVMs (again depending on the software or module used) are usually accompanied by the ability to specify class-specific costs. To combat the problem you are relating you can simply specify a higher cost (I.e., penalty) on the minority class.

Hope that helps!

DMML
  • 1,422
  • 4
  • 22
  • 39