0

I have to use the J48 tree induction algorithm in some tasks of using data with missing values in. Now i will do some experiential research to compare different missing value approaches in context of J48 tree induction, with different sets of UCI training data and different artificial amputation rates (standard, +10%, +40% amputation) on these data sets.

My main question is, how can i implement the following approaches in the J48 source code or better in my own code using Weka J48 classes. Can i handle this approaches as Meta-Classifiers or other way? Approaches i want to test as counterpart to J48 standard handling and random forest:

  • Delete objects with missing attributes (complete case)
  • Hot-Deck-Methods (find concept internal donator)
  • Surrogate Split (use an other attribute for split, like CART handle missing values)
  • imputation using an other decision tree (concepts->missing attribute values) to find the missing attribut value

Hav i to deactivate the integrated J48(C4.5) missing value handling? How can i deactivate this? I think J48(C4.5) will use Special Value approach for finding tests, probability distribution and split objects into parts during partition of training data and some other during classification.

Now everyone else other missing value approaches which can easily extend J48?

Thanks a lot!

  • Not sure if this is the answer you are looking for, but what about handling the missing values before handing the data to J48? For example: you can delete the instances with missing values or impute the values and fill them in before learning the final decision tree. – Walter Jun 25 '14 at 12:45
  • for the imputation methods it will be a possible way. but in case of using an surrogate attribute for the test data, it will be more nontrivial?! –  Jun 26 '14 at 11:58

0 Answers0