6

So I have been writing my own codes for HoG and its variant to work with depth images. However, I am stuck with testing my trained SVM in the detection window part.

All that I've done right now is to first create image pyramids out of the original image, and run a sliding window of 64x128 size from top left corner to bottom right.

Here's a video capture of it: http://youtu.be/3cNFOd7Aigc

Now the issue is that I'm getting more false positives than I expected.

Is there a way that I can remove all these false positives (besides training with more images) ? So far I can get the 'score' from SVM, which is the distance to the margin itself. How can I use that to leverage my results ?

Does anyone have any insight in implementing a good sliding window algorithm ?

adrianp
  • 2,491
  • 5
  • 26
  • 44
sub_o
  • 2,642
  • 5
  • 28
  • 41

1 Answers1

8

What you could do is add a processing step to find the locally strongest response from SVM. Let me explain.

What you appear to be doing right now:

for each sliding window W, record category[W] = SVM.hardDecision(W)

Hard decision means it return a boolean or integer, and for 2-category classification could be written like this:

hardDecision(W) = bool( softDecision(W) > 0 )

Since you mentioned OpenCV, in CvSVM::predict you should set returnDFVal to true :

returnDFVal – Specifies a type of the return value. If true and the problem is 2-class classification then the method returns the decision function value that is signed distance to the margin, else the function returns a class label (classification) or estimated function value (regression).

from the documentation.

What you could do is:

  1. for each sliding window W, record score[W] = SVM.softDecision(W)
  2. for each W, compute and record:
    • neighbors = max(score[W_left], score[W_right], score[W_up], score[W_bottom])
    • local[W] = score[W] > neighbors
    • powerful[W] = score[W] > threshold.
  3. for each W, you have a positive if local[W] && powerful[W]

Since your classifier will have a positive response for windows cloth (in space and/or appearance) to your true positive, the idea is to record the scores for each window, and then only keep positives which

  • are a locally maximum score (greater that its neighbors) --> local
  • are strong enough --> powerful

You could set threshold to 0 and adjust it until you get satisfying results. Or you could calibrate it automatically using your training set.

Antoine
  • 13,494
  • 6
  • 40
  • 52
  • Great answer, but I have few more questions if you or anyone else don't mind. How do I use it together with scale space ? Do I treat detected window in another scale as just another neighbour ? And how does Non Maximal Suppression comes into play ? Anyway, still, thanks alot for your clear answer. – sub_o Apr 08 '13 at 10:24
  • 2
    Good question about scales! You could indeed include scale in the neighborhood (left/right/up/down/smaller/larger), but it depends a lot on your data and end-goal - do you get a lot of multi-scale false positives? Unfortunately trial-and-error is the best practical methodology in computer vision. About non-maximal suppression, well what I described is a form of non-maximal suppression (you only keep local maximums). – Antoine Apr 08 '13 at 12:54