0

I am quite newbie on vlfeat library for computer vision and I have problems dealing with it. What i am trying to do is using the Histograms of Oriented Gradients (HOG) as feature vectors to classify in LIBSVM images with different dimensions.

The first issue i am dealing is the fact that vl_hog returns me a HOG matrix, not a vector. This is not a real issue because I can vectorize this matrix as it follows:

            hog = vl_hog(image,cellSize); 
            features=hog(:);

The second problem is what's freaking me out. Because the images have different dimensions, the feature vectors also have different dimensions, so it's impossible to feed libsvm with them, or i'm wrong? can I solve this in an easier way? did i miss something?

mad
  • 2,677
  • 8
  • 35
  • 78
  • You have to take blocks of same size, I mean if you want to do pedestrian detection (which is what original HOG paper did), they took constant block size of 128x64. It depends on what you want to detect. Obviously, there exists advanced methods now, but you have to describe the problem. – Autonomous Mar 03 '14 at 15:54
  • @Parag I want to describe some images with Hog, but what i am not understanding is why vl_hog is returning a matrix and not a vector. Thanks anyway. – mad Mar 03 '14 at 17:53
  • 1
    What is your goal? Do you want to identify the category of that image, or do you want to do object detection? If you want to describe images, visualize HOG, [this](http://web.mit.edu/vondrick/ihog/#code) is an excellent link. Just play with some images on that page and you will know what they are doing. – Autonomous Mar 03 '14 at 18:10
  • @Parag I want to identify the category of a given image. The images are letters in bounding boxes without fixed size. I want a feature vector to describe a given letter, but these feature vectors must have same dimensions no matter the size of images, so i can use libsvm to do the classification. What I can see in both vlfeat and your link is a way to see only the gradient angles, am I right? – mad Mar 03 '14 at 18:36
  • 1
    Ok. You have two options. 1. Find a common size to which you can resize all the letters without changing the aspect ratio. This should do the job. 2. Part-based detection: This is slightly difficult, and would be overkill for letter detection. Also, you will have to read some literature before you start part-based detection. I would go for first option. – Autonomous Mar 03 '14 at 18:47

1 Answers1

1

You need to create a global representation of your local features so that you can feed your data to SVMs. One of the most popular approaches for this task is bag-of-words(features). vlfeat has an excellent demo/example for this. You can check this code from vlfeat website.

For your particular case, you need arrange your training/testing data in Caltech-101 like data directories:

  • Letter 1
    • Image 1
    • Image 2
    • Image 3
    • Image 4
    • ...
  • Letter 2
    • ...
  • Letter 3
    • ...

Then you need to adjust following configuration settings for your case:

conf.numTrain = 15 ;

conf.numTest = 15 ;

conf.numClasses = 102 ;

This demo uses SIFT as local features, but you can change it to HOG afterwards.