100% accuracy from libsvm

Question

I'm training and cross-validating (10-fold) data using libSVM (with linear kernel).

The data consist 1800 fMRI intensity voxels represented as a single datapoint. There are around 88 datapoints in the training-set-file for svm-train.

the training-set-file looks as follow:

+1 1:0.9 2:-0.2 ... 1800:0.1

-1 1:0.6 2:0.9 ... 1800:-0.98

...

I should also mention i'm using the svm-train script (came along with the libSVM package).

The problem is that when running svm-train - it's result as 100% accuracy!

This doesn't seem to reflect the true classification results! The data isn't unbalanced since

#datapoints labeled +1 == #datpoints labeled -1

Iv'e also checked the scaler (scaling correctly), and also tried to change the labels randomly to see how it impacts the accuracy - and it's decreasing from 100% to 97.9%.

Could you please help me understand the problem? If so, what can I do to fix it?

Thanks,

Gal Star

I don't think there is a problem. Your SVM can easily give 100% fit for training set, it is perfectly fine. This is called overfitting http://en.wikipedia.org/wiki/Overfitting I think you need to read up on training in-sample and out-of-sample. — sashkello, Jan 27 '14 at 23:13
This question appears to be off-topic because it is about machine learning. — sashkello, Jan 27 '14 at 23:14
I mean read some literature on this topic :) This is too large of a problem to outline as an answer, there is a lot of research about proper training and cross-validation. If you don't know what it means, this is what you need to know before doing any coding... — sashkello, Jan 27 '14 at 23:17
Hi, so basicaly you think that i should be having better results if i'll reduce the amount voxels intensity from 1800 to a smaller amount, maybe by choosing the correct representative voxels? — gal.star, Jan 27 '14 at 23:23
I do know what training and cross-validation mean :) I'll try to see whether i can choose the best voxels to eliminate an overfitting problem - thank you for your help. — gal.star, Jan 27 '14 at 23:37
Since you are using linear kernel, 100% result means that your training set is perfectly linearly separable. It may be that your training set is too small. Hand-picking samples will not make situation better, only worse. What is your out-of-sample accuracy? — sashkello, Jan 27 '14 at 23:44

score 2 · Answer 1 · answered Jan 27 '14 at 23:21

2

Make sure you include '-v 10' in the svmtrain option. I'm not sure your 100% accuracy comes from training sample or validation sample. It is very possible to get a 100% training accuracy since you have much less sample number than the feature number. But if your model suffers from overfitting, the validation accuracy may be low.

answered Jan 27 '14 at 23:21

lennon310

12,503
11
43
61

Thank you for answering:) I have used the -v 10 option. An overfitting could be the problem. Though, should it be causing high results? – gal.star Jan 27 '14 at 23:26
It's possible. I would suggest you shrink your region of interest, thus to reduce the voxel number (features), and observe the cross validation result again. – lennon310 Jan 28 '14 at 00:21

100% accuracy from libsvm

1 Answers1