1

I am doing a project in computer vision and I need some help. The objective of my project is to extract the attributes of any object - for example if I have a Nike running shoe, I should be able to figure out that it is a shoe in the first place, then figure out that it is a Nike shoe and not an Adidas shoe (possibly because of the Nike tick) and then figure out that it is a running shoe and not football studs.

I have started off by treating this as an image classification problem and I am using the following steps:

  1. I have taken training samples (around 60 each) of say shoes, heels, watches and extracted their features using Dense SIFT.
  2. Creating a vocabulary using k-means clustering (arbitrarily chosen the vocabulary size to be 600).
  3. Creating a Bag-Of-Words representation for the images.
  4. Training an SVM classifier to obtain a bag-of-words (feature vector) for every class (shoe,heel,watch).
  5. For testing, I extracted the feature vector for the test image and found its bag-of-words representation from the already created vocabulary.
  6. I compared the bag-of-words of the test image with that of each class and returned the class which matched closest.

I would like to know how I should proceed from here? Will feature extraction using D-SIFT help me identify the attributes as it only represents the gradient around certain points?

And sometimes, my classification goes wrong, for example if I have trained the classifier with the images of a left shoe, and a watch, a right shoe is classified as a watch. I understand that I have to include right shoes in my training set to solve this problem, but is there any other approach that I should follow?

Also is there any way to understand the shape? For example if I have trained the classifier for watches, and there are watches with both circular and rectangular dials in the training set, can I identify the shape of any new test image? Or do I simply have train it separately for watches with circular and rectangular dials? Thanks

user3705926
  • 714
  • 2
  • 9
  • 14
  • Have you tried training a Haar cascade using OpenCV? – wbest Jun 13 '14 at 15:01
  • Hi, I have heard of Haar cascades, but aren't the Haar features specific to face detection? – user3705926 Jun 14 '14 at 15:35
  • You should be able train them to detect anything, given good data. It can take a while to train it, though. – wbest Jun 16 '14 at 14:24
  • Ok, I did some further study on Haar cascades and found that it is primarily used for object detection and not classification. Is there any thing else that I can do? Thanks – user3705926 Jun 17 '14 at 05:12
  • I was mainly suggesting that to help with your second part where you have trouble with watches and shoes, and "figure out that it is a shoe in the first place". As for emblems, I guess SIFT and SURF should work fine. I don't think you should use CV and ML for distinction between types of shoes. Just write something that like looks at the bottom. If it's bumpy it's a cleat. I'd keep it simple. – wbest Jun 17 '14 at 14:46

0 Answers0