bags of words object recognition for large datasets

Question

I'm implementing object recognition with bags of words histograms. The histograms are composed of 200 "words" per image, kmeans from the descriptors. The problem is that for a large dataset, say 5000 images, we suddenly have 200x5000=1,000,000 words in a histogram. This means that every object will be represented by a 1,000,000 length histogram.

This gets too big and cumbersome past some point. Is there someway around this?

score 0 · Answer 1 · answered Jul 28 '12 at 04:01

0

Generally, you choose a codebook size that is independent of the number of training images. You would build the codebook by running k-means (or some other dictionary learning method) over a set of descriptors extracted from all the training data.

So, in your example, if you had 5000 training images, and approximately 1000 descriptors extracted from each image, that would give you 5,000,000 descriptors that you could cluster using k-means.

That could be very time-consuming, so you may choose to cluster using a random subset of the descriptors.

answered Jul 28 '12 at 04:01

it's really nice answer, but I have 2 questions please.. First: let's say for example that I have created 100000 histogram length vector training for 4 objects. Now, I should create histogram for test as well, let's say 20000,when I want to classify them, how can I get the number of classified images (not the number of matched histogram)?? ,, second question, is it obligatory to use non-classes when I work on multi-class classification? or it just used when we are working on one class classification? – Mario Sep 26 '13 at 15:01
@Mario What is your 100,000-length vector? – Sep 26 '13 at 15:13
actually, I have 60,000-length vector for 3 objects each object 100 images. It is just an experience that I'm working on it now, but what I'm puzzled about is the testing part – Mario Sep 26 '13 at 16:36
it's a 60,000*1 histogram vector below, I will put a sample of these data – Mario Sep 26 '13 at 16:37
0.018867925, 0.0094339624, 0.028301887, 0.0094339624, 0.0094339624, 0, 0.03773585, 0, 0, 0, 0, 0.0094339624, 0, 0.0094339624, 0.0094339624, 0.0094339624, 0.028301887, 0, 0, 0, 0.018867925, 0, 0, 0, 0.0094339624, 0.0094339624, 0.028301887, 0.018867925, 0.018867925, 0.0094339624, 0, 0, 0.0094339624, 0, 0.018867925, 0.0094339624, 0.018867925, 0.0094339624, 0, 0.018867925, 0, 0, 0.0094339624, – Mario Sep 26 '13 at 16:38
What do the numbers in the vector represent? – Sep 26 '13 at 16:43
each one them represents a computed histogram from the clusters that has been created... – Mario Sep 26 '13 at 16:50
This is not very clear. Perhaps you could write up a description of the process you're doing and start a new question. If you leave the link here, I'll visit and try to answer. You need to describe the process you use to determine the cluster centers, and the process you use to populate the histogram bins. – Sep 26 '13 at 16:52

bags of words object recognition for large datasets

1 Answers1