10

So there many options of how one can extract HoG features. Using different orientations, different numbers of pixels per cell and different block sizes.

But is there a standard or optimal configuration? I have training images of size 50x100, and I'm opting for 8 directions of orientation. I'm extracting the features from training data in order to do vehicle classification. But I really don't know what's "optimal".

For example, I have 2 configurations here, is there any reason to choose one over the other? Personally I feel like the second one is a better choice, but why?

enter image description here

enter image description here

user961627
  • 12,379
  • 42
  • 136
  • 210

1 Answers1

9

I used HOG for product recognition. From what I understood at the time, you are pointing to a real problem of the standard HOG. There is simply no optimal configuration, it depends on the dataset. If you have the optimal values for your dataset, and then resize all the pictures of your dataset, you should resize your values too. Thus, there is no optimal "one size fits all" values for HOG.

But all is not lost. What you should do instead is a method that works "all the time". The idea is to do Spatial Pyramid Matching. This is just doing HOG at various scales and combining them together. A picture being worth a thousand words :

From the article

You can see that here, level 2 is just the standard HOG with fine cells. But perhaps it is not the best scale (because the cells are too small and you just observe noise) (On the other hand, too large cells, like level 0, may be too large, and you will have uniform histograms everywhere). You can compute the best weights for each level when you do the training on your dataset, and you will know what are the optimal values, i.e : what is the most relevant cell size

B. Decoster
  • 7,723
  • 1
  • 32
  • 52
  • But for a person whose a experience with HoG features, do you also agree that the bottom image I put up is at a better HoG scale than the top image? – user961627 Jun 21 '14 at 07:45
  • 1
    From personal experience, if image is not very small, 8x8 pixels per cell is often the good configuration. And 9 orientation - UoCTTI variant. P. F. Felzenszwalb, R. B. Grishick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. PAMI, 2009. – old-ufo Jun 21 '14 at 09:21
  • what do you mean by "very small". I'm guessing 50x100 images would qualify as not very small? – user961627 Jun 22 '14 at 08:37
  • 1
    Absolute cell sizes are not relevant in my opinion. Also, I agree that the bottom image is at a better scale because the objects you are "seeing" have the scale of a cell (window curvature, wheel curvature, door handles, etc...) or several cells. The top one is too small and you observe a lot of noise. – B. Decoster Jun 22 '14 at 20:46
  • When you say "Do HOG at different scales" - I mean, that's the question right? How DO you do that exactly? Do you mean that you have a *fixed* HOG descriptor size, (eg 8x8 cell, 2x2 blocks, 128x64 pixels for one HOG descriptor), and then just do a pyramid on the IMAGES? Or do you mean that you keep the IMAGE the same size, but do a smaller HOG each time?). Which one is it? Thanks – Spacey Oct 13 '14 at 15:27
  • Look at the picture, you can see that here, you use 3 levels. First level is a histogram of the full picture. Level 1 is 4 histograms, one for each quadrant. You can go as deep as you want, but beware of complexity (plus, it is not really useful after a while). So you keep your image, and you do HOG at smaller size each time. (This also means that it would be a good idea to resize all your images in your dataset to the same size, for example 1024x1024) – B. Decoster Oct 13 '14 at 21:07