-1

I ran C4.5 Pruning algorithm in Weka using 10-fold cross validation. I noticed that the unpruned tree had a higher testing accuracy than a pruned tree. I couldn't understand the reason about why pruning the tree didn't improve the testing accuracy?

Dan
  • 85
  • 1
  • 1
  • 9

1 Answers1

1

Pruning reduces the size of the decision tree which (in general) reduces training accuracy but improves the accuracy on test (unseen) data. Pruning helps to mitigate overfitting, where you would achieve perfect accuracy on training data, but the model (i.e. the decision tree) fails whenever it sees unseen data.

So, pruning should improve testing accuracy. From your question, its difficult to say why pruning is not improving the testing accuracy.

However, you can check your training accuracy. Just check whether pruning is reducing the training accuracy or not. If not, then the problem is somewhere else. Probably then you need to think about the number of features or the dataset size!

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
  • Thanks! I checked pruning is reducing the training accuracy. Any suggestions on how to move forward? – Dan Feb 04 '17 at 22:34
  • Is it because my unpruned tree is overfitting the data? Would having more data improve the performance of upruned tree? My testing accuracy before for unpruned was about 98% which went down to 97% for pruned – Dan Feb 04 '17 at 22:48
  • pruned tree should have higher accuracy over test data but since you are not getting improved performance after pruning, you may try by considering more training data. You case may be little exceptional, but in general pruned tree should perform better than unpruned tree over test instances. – Wasi Ahmad Feb 07 '17 at 14:40