I have 1 tree (ID3 or J48) in weka . it has only 25 training set. and it learns 100% accuracy. I think this is too high for accuracy of training set. how can I understand weather it has overfiting problem or not? (I want to use my test set from this 25 train data itself- because I don't hava any test) and I khow cross validation is good for stop overfitting ,but I want to prove it before using cross validation. actually I pruned this tree and compare cross validation accuracy between pruned and unpruned trees. but I can't explain and understand how does accuracy should change between the overfited tree and pruned tree? (In this case I khow that my tree has overfiting problem - but how can I infer ?) what about other way? can you suggest me? notice that test data is not available .
Asked
Active
Viewed 77 times
1 Answers
0
This is what I would do:
- Take the 25 data points and use 10 fold cross validation. Record the accuracy (provided that your classes are balanced/near-balanced)
- Take the training accuracy and compare these two accuracy values. If they differ significantly (say 100% training accuracy vs 85% test accuracy), then this is a signal for overfitting to me. From that point on, I would try to increase data points and plot learning curves as I increase them.
NOTE: If you do not have any test data then CV is the only choice and the results you obtain from CVs should be considered as test results.

Rushdi Shams
- 2,423
- 19
- 31
-
you 're right.. but when I pruned tree and used 10 fold cross validation the accuracy was like unpruned tree? for example 80% and 80% . so It was a little strange ! why is that? – patric Oct 24 '15 at 18:23