how does cross validation work for these 2 trees?

Question

I have 1 tree (ID3 or J48) in weka . it has only 25 training set. and it learns 100% accuracy. I think this is too high for accuracy of training set. how can I understand weather it has overfiting problem or not? (I want to use my test set from this 25 train data itself- because I don't hava any test) and I khow cross validation is good for stop overfitting ,but I want to prove it before using cross validation. actually I pruned this tree and compare cross validation accuracy between pruned and unpruned trees. but I can't explain and understand how does accuracy should change between the overfited tree and pruned tree? (In this case I khow that my tree has overfiting problem - but how can I infer ?) what about other way? can you suggest me? notice that test data is not available .

score 0 · Accepted Answer · answered Oct 23 '15 at 20:52

This is what I would do:

Take the 25 data points and use 10 fold cross validation. Record the accuracy (provided that your classes are balanced/near-balanced)
Take the training accuracy and compare these two accuracy values. If they differ significantly (say 100% training accuracy vs 85% test accuracy), then this is a signal for overfitting to me. From that point on, I would try to increase data points and plot learning curves as I increase them.

NOTE: If you do not have any test data then CV is the only choice and the results you obtain from CVs should be considered as test results.

you 're right.. but when I pruned tree and used 10 fold cross validation the accuracy was like unpruned tree? for example 80% and 80% . so It was a little strange ! why is that? — patric, Oct 24 '15 at 18:23

how does cross validation work for these 2 trees?

1 Answers1