0

For classification I used Weka's J48 decision tree to build a model on several nominal attributes. Now there is more data for classification (5 nonimal attributes) but each attribute has 3000 different values. I used J48 with pruning but it ran out of memory (associated 4GB). With a smaller dataset, I saw in the output, that J48 keeps all leaves with no instances associated with it. Why are they kept in the model? Should I switch to another classifcation algorithm?

G5W
  • 36,531
  • 10
  • 47
  • 80
P. Moe
  • 1
  • 1
  • 2
    You need to do some feature processing, it is very not wise to throw in a categorical feature with 3000 values to a decision tree model. – TYZ Apr 02 '18 at 15:07
  • You could set J48's minNumObj hyperparameter to a higher value, say 20, or try the rules/PART algorithm which is (according to the context menu documentation) a simplified version of C4.5 / J48 - maybe it needs less memory – knb Apr 03 '18 at 07:36
  • 1
    "keeps all leaves with no instances" -- is this in the test set? There may be no instances with those values in the test set. – zbicyclist Apr 04 '18 at 02:06

0 Answers0