-1

I am working on a scene recognition problem with bag of visual words. Here is a code that I adapted from the Internet. In the training dataset, I have 5 classes each having 100 images. In the random test dataset, I have 5000 images. I understand that I should make a vocabulary from the training set. But should I also make a vocabulary of test dataset?

FEATURE = 'bag of sift';
CLASSIFIER = 'support vector machine';
categories = {'shopping', 'office', 'eating', 'chatting', 'biking'};
num_train_per_cat = 100;
vocab_size = 200;

% YOUR CODE FOR build_vocabulary.m
vocab = build_vocabulary(train_image_paths, vocab_size); 

% YOUR CODE FOR get_bags_of_sifts.m
fprintf('Computing training features\n');
train_image_feats = get_bags_of_sifts(train_image_paths,vocab); 
save('train_bag.mat', 'train_image_feats');
fprintf('Computing test features\n');
test_image_feats  = get_bags_of_sifts(test_image_paths,vocab);

% YOUR CODE FOR svm_classify.m 
test_image_feats_mat = cell2mat( test_image_feats);
test_image_feats= vl_svmdataset(test_image_feats_mat);
predicted_categories = svm_classify(train_image_feats,train_labels, test_image_feats)
hmofrad
  • 1,784
  • 2
  • 22
  • 28

1 Answers1

1

Regarding your question, you mustn't make a vocabulary out of the test dataset. You have to use the encode method for counting the visual word occurrences in the test images. The encode method produces a histogram that becomes a new and reduced representation of an image.

Example:

features = encode(vocabulary, img)

To summarize, you have to encode the train/test dataset. The output of the encode method becomes the input of a classifier.

Liviu Stefan
  • 183
  • 8