The phow_caltech101 demo app in vlfeat creates a complete Bag of Words process for image classification on the Caltech101 dataset, roughly put:
- Feature Extraction
- Visual Vocabulary building
- Spatial Histograms computation
- SVM training
- SVM testing and evaluation,
obtaining a model that can be used to later classify new, unclassified instances. The only problem the histograms computed are spatial histograms, this means if I have a visual vocabulary of size n, I would have expected the histogram to have size n x (size_collection), containing the ocurrences of each visual word in each training instance.
The spatial histograms, however, are stored in a structure according to the model specified, by default it has two spatial arguments, spatialX and spatialY, which results in a structure with size spatialX * spatialY * (size_vocabulary) which is later normalized and this is the one used to train the SVM.
Now, what if i want to use the normal histogram, normalized or not, but the histogram that gives me a 1-1 correspondance on visual word per image, or obtain this information from the spatial histogram? Also, how much more efficient is the use of the spatial histogram instead of the classical one I take into account when I picture the Bag of Words process?
Any help appreciated.
UPDATE:
Here is part of the code where the histograms are computed, you can see how instead of ending with a histogram vector of size (number_visual_words) you end up with a histogram of size (spatialX * spatialY * number_visual_words). Let me clarify, in this case, the model is defined to have spatialX = [2 4] and spatialY = [2 4].
for i = 1:length(model.numSpatialX)
binsx = vl_binsearch(linspace(1,width,model.numSpatialX(i)+1), frames(1,:)) ;
binsy = vl_binsearch(linspace(1,height,model.numSpatialY(i)+1), frames(2,:)) ;
% combined quantization
bins = sub2ind([model.numSpatialY(i), model.numSpatialX(i), numWords], ...
binsy,binsx,binsa) ;
hist = zeros(model.numSpatialY(i) * model.numSpatialX(i) * numWords, 1) ;
hist = vl_binsum(hist, ones(size(bins)), bins) ;
hists{i} = single(hist / sum(hist)) ;
end
hist = cat(1,hists{:}) ;
hist = hist / sum(hist) ;
And part of the problem is that I havent worked with spatial histogram either, so Im not sure how much better than "normal" histograms they are. Maybe someone who has worked with this kind of histograms before could give a more helpful insight.