The steps to do object detection in natural image?

Question

I am new to computer vision, can anyone tell me the steps to do object detection in natural image? (Here object refers to logo). I drafted the following steps based on my own understanding:

Problem statement: Suppose there are 20 reference logos, given an natural image, tell out which logo exists in the image and in which location (bounding box).

Step1: Collect many (i.e. 100) images containing corresponding logos, and crop out the logo region. Hence, there are 100 examples for each logo. The purpose of this step is to deal with logos under different conditions, such as illumination, rotations etc.

Step2: Collect random images that don't contain any logos.

Step3: Extract features for example logos and random images, use SIFT feature.

Step4: Now, the problem becomes a multi-class classification problem. There are 21 classes, 20 classes corresponds to 20 logos, and 1 class corresponds random images.

Question1: use which classifier? what is input and what is output?

Step5: Given a test image, extract SIFT features, use all the features as input?

Question2: For the test image, use what as input and how to do the classification to tell out whether it contains a logo or not, and which logo it is?

Question3: How to determine the location of the detected logo?

Question4: Any image labeling or cropping tool?

If my procedure is not correct, please tell me how to do this step by step. Thanks in advance!!

score 0 · Answer 1 · edited May 23 '17 at 10:09

Question1: I can advice you to use Support Vector Machine. It's simple but powerful classifier for tasks with small dataset. It's easy to find implementations of SVM for most of popular programming languages. You should extract SIFT (or any other) features for patches with or without logos of same size and use them as classifiers input. Ground truth classification labels are logo names and some label for clean patches. So, if you have 20 logos, you will have 21 different class labels.

Question2 and 3: You should use sliding window technique. Its essence lies in the fact that you can crop patches of the test image with some stride and use your classifier to predict if there is some logo or not. You can read more about it, for example, here.

Question4: Seems like that thread has the answer: image labelling and annotation tool

Some advices:

Bootstrapping can help you to find the most difficult for classifier patches without logos
Use cross-validation to determine the best parameters of SIFT, SVM or optimal patch size.

Good luck!

Thank you very much for your prompt answer. I will test it based on your suggestions. BTW, is sliding window slow? As I need to try different sizes, is there an alternative? — kim, Mar 06 '15 at 02:34
Yes, sliding window may be slow, but I don't know other equally effective methods. You can optimize your solution by extracting SIFT for whole image and then getting patches instead of extraction SIFT for every patch separately. Also pyramid-approach may be effective: you can find candidate positions on small image and then fine-tune them. There are more complicated approaches, such as convolution neural networks, which can help to avoid using of sliding window, but it's much more difficult to try them. — SkyHawk, Mar 06 '15 at 13:04

The steps to do object detection in natural image?

1 Answers1