0

I am working on an image classification problem where I should be able to classify an image as say a watch with a rectangular dial/ a watch with a circular dial/ a shoe etc..

I have looked into Content Based Image Retrieval (using Dense SIFT for feature detection and Bag of Words + SVM for classification) and am currently exploring Convolutional Neural Networks (Unsupervised Feature Learning).

My problem is that the image is a photo taken from a camera and hence contains other elements (not there in training data). For example, my training data for watches with rectangular dials contains only the watch whereas my test image has the watch and a portion of the hand as well or my test image of a shoe has the shoe oriented in a different direction (when compared with the training data for shoes).

How do I address this issue? Is CNN (Unsupervised Feature Learning) the correct approach or should I stick to D-SIFT + BOW + SVM? How do I collect appropriate training data?

Thank You

user3705926
  • 714
  • 2
  • 9
  • 14
  • 1
    If the object of interest is only a part of the image, and not necessarily a large part, you can use a sliding window approach. Run your classifier when looping on the size of the window and on it's location, collect candidates (windows where the classifier gave a high score) and then merge those candidates using the function groupRectangles from opencv (for example) – GilLevi Jun 30 '14 at 22:54
  • Hi, In most cases the object of interest covers a major part of the image. But my major problem is that its orientation may be different. Can you suggest me what features do I use? Should I even detect features or should it be unsupervised? Thank you for your reply. – user3705926 Jul 01 '14 at 05:17
  • Can you modify the training set to contain more realistic images? – GilLevi Jul 01 '14 at 13:55
  • I will do that and let you know my results. Thank you for your time. – user3705926 Jul 01 '14 at 19:58

0 Answers0