1

I'll be appreciated if you help me to create a feature vector of an simple object using keypoints. For now, I use ETH-80 dataset, objects have an almost blue background and pictures are took from different views. Like this:
Two different views
After creating a feature vector, I want to train a neural network with this vector and use that neural network to recognize an input image of an object. I don't want make it complex, input images will be as simple as train images. I asked similar questions before, some one suggested using average value of 20x20 neighborhood of keypoints. I tried it, It seems it's not working with ETH-80 images, because of different views of images. It's why I asked another question.

Community
  • 1
  • 1
Maysam
  • 7,246
  • 13
  • 68
  • 106

2 Answers2

4

SURF or SIFT. Look for interest point detectors. A MATLAB SIFT implementation is freely available.

Update: Object Recognition from Local Scale-Invariant Features

Jacob
  • 34,255
  • 14
  • 110
  • 165
  • It's good for matching two images. I want to create a feature vector that can feed a neural network. – Maysam Sep 02 '11 at 07:05
  • @Maysam: So train your network on several feature descriptors belonging to the same class of images. – Jacob Sep 02 '11 at 12:27
  • It's my question, which features can I extract? – Maysam Sep 02 '11 at 14:10
  • @Maysam: SIFT/SURF descriptors! – Jacob Sep 02 '11 at 14:37
  • @Maysam: Yes. SIFT has some issues with illumination, but they are scale and rotation invariant, and invariant to affine distortions to some degree – Jacob Sep 02 '11 at 14:44
  • @Jacom, view point I mean, for example, car side view and rear view are completely different, SIFT cannot handle this. But it seems I have to limit view points to use these descriptors. thank you – Maysam Sep 02 '11 at 14:47
2

SIFT and SURF features consist of two parts, the detector and the descriptor. The detector finds the point in some n-dimensional space (4D for SIFT), the descriptor is used to robustly describe the surroundings of said points. The latter is increasingly used for image categorization and identification in what is commonly known as the "bag of word" or "visual words" approach. In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. The VLfeat website contains a nice demo of this approach, classifying the caltech 101 dataset:

http://www.vlfeat.org/applications/apps.html#apps.caltech-101

Maurits
  • 2,082
  • 3
  • 28
  • 32