Alternative to spatial histograms in Bag of Words approach using vlfeat

Question

The phow_caltech101 demo app in vlfeat creates a complete Bag of Words process for image classification on the Caltech101 dataset, roughly put:

Feature Extraction
Visual Vocabulary building
Spatial Histograms computation
SVM training
SVM testing and evaluation,

obtaining a model that can be used to later classify new, unclassified instances. The only problem the histograms computed are spatial histograms, this means if I have a visual vocabulary of size n, I would have expected the histogram to have size n x (size_collection), containing the ocurrences of each visual word in each training instance.

The spatial histograms, however, are stored in a structure according to the model specified, by default it has two spatial arguments, spatialX and spatialY, which results in a structure with size spatialX * spatialY * (size_vocabulary) which is later normalized and this is the one used to train the SVM.

Now, what if i want to use the normal histogram, normalized or not, but the histogram that gives me a 1-1 correspondance on visual word per image, or obtain this information from the spatial histogram? Also, how much more efficient is the use of the spatial histogram instead of the classical one I take into account when I picture the Bag of Words process?

Any help appreciated.

UPDATE:

Here is part of the code where the histograms are computed, you can see how instead of ending with a histogram vector of size (number_visual_words) you end up with a histogram of size (spatialX * spatialY * number_visual_words). Let me clarify, in this case, the model is defined to have spatialX = [2 4] and spatialY = [2 4].

for i = 1:length(model.numSpatialX)
  binsx = vl_binsearch(linspace(1,width,model.numSpatialX(i)+1), frames(1,:)) ;
  binsy = vl_binsearch(linspace(1,height,model.numSpatialY(i)+1), frames(2,:)) ;

  % combined quantization
  bins = sub2ind([model.numSpatialY(i), model.numSpatialX(i), numWords], ...
             binsy,binsx,binsa) ;
  hist = zeros(model.numSpatialY(i) * model.numSpatialX(i) * numWords, 1) ;
  hist = vl_binsum(hist, ones(size(bins)), bins) ;
  hists{i} = single(hist / sum(hist)) ;
end
hist = cat(1,hists{:}) ;
hist = hist / sum(hist) ;

And part of the problem is that I havent worked with spatial histogram either, so Im not sure how much better than "normal" histograms they are. Maybe someone who has worked with this kind of histograms before could give a more helpful insight.

I haven't worked with spatial histograms before but I have a lot of experience with SVMs. Is it a preprocessing step for the SVM? You might want to post your SVM training code if you have an error with your output. — krisdestruction, Apr 22 '15 at 21:58
@krisdestruction, if you have experience with SVM, maybe you could take a look at the phow_caltech101 demo app in the vlfeat package, you can see it in their website: http://www.vlfeat.org/applications/caltech-101-code.html, you can see the SVM training and the spatial histograms computation there. — Rolo Villa, Apr 23 '15 at 18:47
Unfortunately I rather not read an entire 1000 lines of code to understand your problem. Plus I typically prefer to use Libsvm for training instead. It's much faster than any custom function/library I've seen. Perhaps you can clarify the actual issue for someone who doesn't know about spatial histograms? — krisdestruction, Apr 23 '15 at 18:49

Alternative to spatial histograms in Bag of Words approach using vlfeat

0 Answers0