My objective is to classify images into one of a few predefined categories (SportShoes, Shirts, Heels, Watches..) from my catalog (and later on return similar images from the catalog).
I am using Dense-SIFT for feature extraction, representing each image using a Bag of Visual Words and SVM for classification. All my training images are taken from the catalog.
The problem is that the images that I am querying for are pictures taken from a camera and these look very different from the catalog images. For example, all the Heels/SportShoes in my catalog contain only right shoe taken at one particular angle, whereas my query image contains the Heel and a part of the foot as well, and the angle at which the photo is taken can vary (deviation from the catalog images).
Hence the classification works only when my query(test) image is an image from the catalog (those that I have NOT used for training), but not for images taken from the camera.
How do I proceed? Is it a problem with my feature vector or my training data itself? If I cannot change the training data, is there anything else I can use? Should I use a completely different approach (not bag-of-words) ?
Thanks