0

I want to run cross validation on my training set using SVMlight. It seems that the option for this is -x 1 (although I'm not sure how many folds it implements...). The output is:

XiAlpha-estimate of the error: error<=31.76% (rho=1.00,depth=0)
XiAlpha-estimate of the recall: recall=>68.24% (rho=1.00,depth=0)
XiAlpha-estimate of the precision: precision=>69.02% (rho=1.00,depth=0)
Number of kernel evaluations: 56733
Computing leave-one-out **lots of gibberish here**
Retrain on full problem..............done.
Leave-one-out estimate of the error: error=12.46%
Leave-one-out estimate of the recall: recall=86.39%
Leave-one-out estimate of the precision: precision=88.82%
Actual leave-one-outs computed:  412 (rho=1.00)
Runtime for leave-one-out in cpu-seconds: 0.84

How can I get the accuracy? From the estimate of the error?

Thank you!

Cheshie
  • 2,777
  • 6
  • 32
  • 51

1 Answers1

3

These are contradicting concepts. Training error is the error on the training set, while cross validation is used to approximate the validation error (on the data not used for training).

Your output suggests that you are using N-folds (where N-size of the training set) which leads to so called "leave one out" validation (only 1 testing point!) which is overestimating your model's quality. You should try 10-folds, and your accuracy is simply 1-error.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Thank you @lejlot, but I'm afraid I didn't understand you. 1. Cross validation is not used on training data? (Is [this](http://en.wikipedia.org/wiki/Cross-validation_(statistics)#K-fold_cross-validation) incorrect?) 2. I didn't understand the second part of your answer - I didn't see how I could decide how many folds to use, and why is it overestimating the model's quality, and what is 1-error... I really apologize... if you could please explain some more I'd really really appreciate it. Thank you so much! – Cheshie Mar 24 '14 at 20:36
  • 1
    Training data is the data used for **training** as name suggests. While cross validation splits your data into training and testing, former is never seen during the training phase. You have written "training accuracy" - this is wrong, not the sentence about using the data for CV. Training accuracy is **not** measured by CV. "1-error" means "substract error value from 1 and you get accuracy". Finally - leave one out "overestimates" in the sense that it returns much higher accuracy than it should. The "reasonable" number of folds is 10 (often used in such cases). – lejlot Mar 24 '14 at 20:41
  • Ah... now I get you better @lejlot, thanks, but - IMHO, I thought that cross validation is used for testing purposes during the _training_ phase, isn't it? And as to what you say about using 10 folds - I agree, but leave-one-out estimates are (I think) the only form of cross validation that svmlight allows. – Cheshie Mar 24 '14 at 21:00
  • You are still confusing terms. "Training error" is not "error computed during training phase" but "error **on the training set**". This is completely different than cv based estimation of error on **validation** set. – lejlot Mar 24 '14 at 21:09
  • Apologies if I've missed it, but how do you specify the number of folds for leave-one-out cross validation in the training set using svmlight? -x can be 0 or 1, so is there another parameter? – Jim Bo Feb 19 '16 at 13:56