Weka Decision Tree getting too big (out of memory)

Asked Apr 02 '18 at 14:09

Active Apr 03 '18 at 00:41

Viewed 569 times

For classification I used Weka's J48 decision tree to build a model on several nominal attributes. Now there is more data for classification (5 nonimal attributes) but each attribute has 3000 different values. I used J48 with pruning but it ran out of memory (associated 4GB). With a smaller dataset, I saw in the output, that J48 keeps all leaves with no instances associated with it. Why are they kept in the model? Should I switch to another classifcation algorithm?

edited Apr 03 '18 at 00:41

G5W

36,531
10
47
80

asked Apr 02 '18 at 14:09

P. Moe

2

You need to do some feature processing, it is very not wise to throw in a categorical feature with 3000 values to a decision tree model. – TYZ Apr 02 '18 at 15:07
You could set J48's minNumObj hyperparameter to a higher value, say 20, or try the rules/PART algorithm which is (according to the context menu documentation) a simplified version of C4.5 / J48 - maybe it needs less memory – knb Apr 03 '18 at 07:36
1

"keeps all leaves with no instances" -- is this in the test set? There may be no instances with those values in the test set. – zbicyclist Apr 04 '18 at 02:06

Weka Decision Tree getting too big (out of memory)

0 Answers0