How many images should be there in the training and testing phase? LibSVM

Question

I am doing face recognition using PCA and SVM. Using libSVM for SVM implementation in matlab. I am trying to implement one vs all classification. I have a threefold question.

First : I have 10 images in class 1(of face 1) then class 2 should have 60 images (10 images each of the 6 faces) ?

Second: Will the accuracy depend on the number of images I take in both the classes? If yes, then Can accuracy become 100%(unreasonably high) due to large number of images in class two?

Third: Can a single image be used for testing?

Any help will be deeply appreciated.

carlosdc · Accepted Answer · 2014-03-26T07:45:08.970

1

You are asking three questions:

(1) EDIT: Yes, exactly as you explained it in the comments. If you have 7 classes you would train 7 classifiers. For each classifiers i you would train for the positive class images of individual i, and for the negative class images of all other individuals.

What you describe is called one-vs-all classification and it is a commonly used method to do multi class classification with a base binary classifier (such as an SVM). Let me also add that there are other methods used to extend binary classifiers to multi-class classification such as one-vs-one and error correcting tournaments.

EDIT #2: Let me add that one-vs-one classification is already implemented in LIBSVM you really don't have to do anything special. All you need to do is add distinct doubles to each of the classes in the training data (so you could use classes 0, 1, ... 7).

If you really want to do one vs all (also called one vs the rest) you can do use it too. Since it seems you're using MATLAB, there is code (it is not directly implemented in LIBSVM) but the authors of LIBSVM make available code to implement that: direct link to FAQs

(2) The accuracy will depend on the number of images. In ideal conditions you will have many images of all individuals to train with. But you can get in situations such as imbalanced datasets, that can occur for example if you train with a million images of class x and only 2 images of class y, and 2 images of class z, you will have problems because your classifier gets a more detailed view of class x than of the other two classes. To evaluate you will need a full confusion matrix (i.e. how many real objects of class x are classifier as class y and how many real objects of class y are classified as class x and so on for every pair of classes).

(3) Yes, it can.

EDIT #3:

It seems, from the comments of the authors of LIBSVM, that the accuracy of one-vs-one is similar to the accuracy of one-vs-all, with the difference that it is faster to train one-vs-one, and that is the reason why they implement one-vs-one in their system.

To train a multi-class model using LIBSVM you would use svmtrain and invoke it only once. Class 1 are images of individual 1, Class 2 are images of individual 2, ... class 7 are images of individual 7.

To predict, after training your model you would use svmpredict

edited Mar 26 '14 at 07:45

answered Mar 23 '14 at 02:43

carlosdc

12,022
4
45
62

1

To do one vs all classification I am training 7 models. Above what I said was for just a single model. For the first model : the class 1 has ten images of the first individual(positive class) and the class 2 has ten images each of the other six individuals(negative class) . So this model will be a positive class for individual one. The second model : class 1 will have ten images of the second individual and the class 2 will have ten images each of the remaining six individuals including the first individual. This model will be positive class for second individual. – Sid Mar 23 '14 at 11:57
And the rest of the 5 models will be trained like this. – Sid Mar 23 '14 at 11:58
So what i am doing is right? In one vs all classification i can have different number of images in class 1 and class 2? Say ten images in class 1 and 60 images in class 2? – Sid Mar 24 '14 at 06:35
I get a desired accuracy of 85% to 98% when i have the same number of images in both the classes BUT when i have different number of images in class 1 and class 2 (Say ten images in class 1 and 60 images in class 2) then the accuracy increases to 100%. The accuracy is 100% even for some models to which the test image doesn't belong to. – Sid Mar 24 '14 at 06:37
You said "one-vs-all classification is already implemented in LIBSVM" how do i use this property of LIBSVM? – Sid Mar 24 '14 at 06:40
@Sid: So... you're testing the classifier i with both images of the subject i and images of other subjects j!=i, and it gets 100% accuracy? – carlosdc Mar 24 '14 at 06:55
Yes carlosdc, model 1: Class 1 is 10 images of first individual, class 2 is 60 images of 6 other individuals. model 2 : Class 1 is 10 images of second individual, class 2 is 60 images of 6 other individuals(including first individual) and so on... – Sid Mar 24 '14 at 06:59
@Sid: OK. So can you now explain what you mean "the accuracy is 100% even for some models to which the test image doesn't belong to"? – carlosdc Mar 24 '14 at 07:00
I label the test image as +1 so that the model can give the accuracy. But for some of the models in which the +1 class is say the fifth individual and even if the testing image is from the first individual even then the accuracy of this model is 100%. – Sid Mar 24 '14 at 07:06
Carlosdc you said that one vs one is already implemented but i have a doubt how to i create the models in that? Model 1: Class 1(+1) will be images of individual 1 and class 2(-1) will have images of second individual. Model 2: Class 1(+2) will be images of individual 1 and class 2(-2) will have images of third individual and so on?.. – Sid Mar 24 '14 at 07:08
1

@Sid: you call svmtrain only once class 1 are images of individual 1 class 2 are images of individual 2, ... class 7 are images of individual 7. – carlosdc Mar 24 '14 at 07:18
Carlosdc so for one vs one i will have seven labels in "training_label" and seven matrices "training_matrix" and then i will run the command:- model=svmtrain(training_label,training_matrix,''); and then [predicted_label,accuracy,probability_estimate]=svmpredict(testing_label,testing_matrix,model); where the "testing_label" can be the true label(in case i want accuracy) or it can be "zeros(size(testing_matrix),1)" (in case i want predicted_label)? – Sid Mar 24 '14 at 08:36
Any suggestions on the accuracy of one vs all classification? – Sid Mar 25 '14 at 04:34
Carlosdc, Should the testimage label be different for different models(In multiclass SVM) because we need to provide 'true labels' in order to obtain accuracy, so if the test image belongs to model 1 then its given a label +1 for model 1, -2 for model 2 , -3 for model 3 and so on.. – Sid May 20 '14 at 06:32

How many images should be there in the training and testing phase? LibSVM

1 Answers1