Object Classification, when to use full image or extracted object?

Question

I'm trying to set up an object classification system with OpenCV. When I detect a new object in a scene, I want to know if the new object belongs to a known object class (is it a box, a bottel, something unknown, etc.).

My steps so far:

Cutting down the Image to the roi where a new object could appear
Calculating keypoints for every Image (cv::SurfFeatureDetector)
Calculating descriptors for each keypoint (cv::SurfDescriptorExtractor)
Generating a vocabulary using Bag of Words (cv::BOWKMeansTrainer)
Calculating Response histograms (cv::BOWImgDescriptorExtractor)
Use the Response histograms to train a cv::SVM for every object class
Using the same set of images again to test the classification

I know that there is still something wrong with my code since the classification don't work yet.

But I don't really know, where I should use the full image (cutted down to the roi) or when I should extract the new object from the image and use just the object itself.

It's my first step into object recognition/classification and I saw people using both, full Images and extracted objects, but I just don't know when to use what.

I hope womeone can clarify this for me.

score 0 · Answer 1 · answered Sep 29 '16 at 18:58

0

You should not use the same images for both testing and training.

In training, ideally you need to extract a ROI which includes just one dominant object, since the algorithm will assume that the codewords extracted from positive samples are the ones that should be presented in a test image to label it as positive. However, if you have a really big dataset like ImageNet, the algorithm should make a generalization.

In testing, you don't need to extract a ROI, because SIFT/SURF are scale invariant features. However, it's good to have a one dominant object in the test set, as well.

I think you should train 1 classifier for your each object class. This is called one-vs-all classifier.

One little note, if you don't want to worry about this issues and have big dataset. Just go with Convolutional Neural Networks. They have a really good generalization capability and are inherently multi-label thanks to their fully connected last layer.

answered Sep 29 '16 at 18:58

cagatayodabasi

762
11
34

Is there any reason for not using the same Images for training and first testing? I thought this should get a 100% right classifications if everything is set up correctly, so I used the same images. – Oronar Sep 30 '16 at 11:35
I got it! You just want to see the result, but remember you do not know your training accuracy, right? Maybe, it's even low. Of course, you need to achieve a high score when you test the algorithm with training images. Could you share your training accuracy? – cagatayodabasi Sep 30 '16 at 11:44
If I understand correctly, I should extract an ROI (AABB) with the object of interest. Then I would have the object with its orieantation and a little bit of the background. When testing, I guess I will have to extract the object because there will be multiple objects in the scene. I already use a 1vsAll classifier, so each object class has it's own SVM. I used this code as template: https://github.com/royshil/FoodcamClassifier/blob/master/main.cpp I think CNN and ImageNet is not appropriate since I have just 30 images per object class for training (but I just need a top view). – Oronar Sep 30 '16 at 11:45
Correct! In the testing part, ideally you don't need to extract a ROI, but if you do, you gonna achieve a higher score. However, I should say that 30 images is really low. As I said before, maybe you cannot even train your algorithm. By the way, they use radial basis function kernel by default http://docs.opencv.org/2.4/modules/ml/doc/support_vector_machines.html, I suggest you to try it with a linear svm due to lack of data. – cagatayodabasi Sep 30 '16 at 12:09
Stupid question, but how can I get the Training accuracy? For now, I extracted the objects in my pictures with an AABB, so a lLittle bit of background is still present. I used the same images for building the vocabulary and for the response histrograms, which I used to create the Training data for the SVMs. I used the SVMs with kernel type set to cv::SVM::LINEAR (no other params changed so far). How can I see how good the training was? I know that 30 images per object is not much, but I will see just one side of the object and so I hope it will be enough. – Oronar Sep 30 '16 at 15:47
Just apply your classifier on data and look at the result. In general you need to divide your data as training - testing. I suggest you to select 25% of your data as your test data. Then, you can test it and see the result. Maybe, you need to tune parameter of svm. Look at this question http://stackoverflow.com/questions/31178095/recommended-values-for-opencv-svm-parameters please. They use a method called trainAuto() to tune paramater of svm. – cagatayodabasi Sep 30 '16 at 21:01

Object Classification, when to use full image or extracted object?

1 Answers1