1

I'm using SMOTE to oversample my dataset (affected by class imbalance). Some of my attributes have integer values, others have only two decimals but SMOTE creates new instances with many decimals. So to solve this problems I thought to use NumericCleaner Filter and set the number of decimals I desire. This seems to work but I've got problems with missing values. Each missing values is replaced with a 0.0 value, I need to evaluate my model using missing values in dataset. So how can I use NumericCleaner (or other filters that permit to round values) and keep my missing values?

Titus Pullo
  • 3,751
  • 15
  • 45
  • 65

1 Answers1

1

Very interesting question. Okay, here is the solution:

  1. use SMOTE to oversample the minority group (this produces decimal points but the missing values remain missing values)
  2. then select weka filter->unsupervised->attribute->NumericTransform
  3. then click on this filter and set the attribute instances (where you are having decimal points features) and in the methodName instead of "abs", put "ceil".

I hope that solves the problem.

Rushdi Shams
  • 2,423
  • 19
  • 31
  • Great! This works perfectly for attributes that needs integer values, but who can I use it to set a custom number of decimals? Because java.lang.math doesn't seem to have a function to do what i need for decimals values – Titus Pullo Apr 20 '12 at 14:46
  • for SMOTE, there is no option of setting a definite number of digits after decimal values. – Rushdi Shams Apr 23 '12 at 00:03