How to implement a better sliding window algorithm?

Question

So I have been writing my own codes for HoG and its variant to work with depth images. However, I am stuck with testing my trained SVM in the detection window part.

All that I've done right now is to first create image pyramids out of the original image, and run a sliding window of 64x128 size from top left corner to bottom right.

Here's a video capture of it: http://youtu.be/3cNFOd7Aigc

Now the issue is that I'm getting more false positives than I expected.

Is there a way that I can remove all these false positives (besides training with more images) ? So far I can get the 'score' from SVM, which is the distance to the margin itself. How can I use that to leverage my results ?

Does anyone have any insight in implementing a good sliding window algorithm ?

Antoine · Accepted Answer · 2013-04-08T09:08:45.037

What you could do is add a processing step to find the locally strongest response from SVM. Let me explain.

What you appear to be doing right now:

for each sliding window W, record category[W] = SVM.hardDecision(W)

Hard decision means it return a boolean or integer, and for 2-category classification could be written like this:

hardDecision(W) = bool( softDecision(W) > 0 )

Since you mentioned OpenCV, in CvSVM::predict you should set returnDFVal to true :

returnDFVal – Specifies a type of the return value. If true and the problem is 2-class classification then the method returns the decision function value that is signed distance to the margin, else the function returns a class label (classification) or estimated function value (regression).

from the documentation.

What you could do is:

for each sliding window W, record score[W] = SVM.softDecision(W)

for each W, compute and record:

neighbors = max(score[W_left], score[W_right], score[W_up], score[W_bottom])

local[W] = score[W] > neighbors

powerful[W] = score[W] > threshold.

for each W, you have a positive if local[W] && powerful[W]

Since your classifier will have a positive response for windows cloth (in space and/or appearance) to your true positive, the idea is to record the scores for each window, and then only keep positives which

are a locally maximum score (greater that its neighbors) --> local
are strong enough --> powerful

You could set threshold to 0 and adjust it until you get satisfying results. Or you could calibrate it automatically using your training set.

Great answer, but I have few more questions if you or anyone else don't mind. How do I use it together with scale space ? Do I treat detected window in another scale as just another neighbour ? And how does Non Maximal Suppression comes into play ? Anyway, still, thanks alot for your clear answer. — sub_o, Apr 08 '13 at 10:24
Good question about scales! You could indeed include scale in the neighborhood (left/right/up/down/smaller/larger), but it depends a lot on your data and end-goal - do you get a lot of multi-scale false positives? Unfortunately trial-and-error is the best practical methodology in computer vision. About non-maximal suppression, well what I described is a form of non-maximal suppression (you only keep local maximums). — Antoine, Apr 08 '13 at 12:54

How to implement a better sliding window algorithm?

1 Answers1