0

Can anybody help me to implement an alternative missing value handling in J48 algorithm using Weka API in Java.

I am sure that using pre-imputation approaches before training the J48 is easy.

But what is about using a surrogate split attribute in case of partition the training date (like Breiman does in CART) instead of the J48 standard approach (Quinlan in C4.5) splitting the cases across a probability distribution from observed cases with known value.

Can anybody give me some information, tip, help, where in the Weka API and Source Code a have to modify to replace standard with surrogate split?

admdrew
  • 3,790
  • 4
  • 27
  • 39

1 Answers1

1

Look at weka source code weka.classifiers.trees.j48.C45ModelSelection from line 152 (Find "best" attribute to split on). It uses info gain ratio as splitting criteria.

doxav
  • 978
  • 8
  • 14