Im working on an automatic image annotation problem in which im trying to associate tags with images. For that im trying for SIFT features for learning. But the problem is all the SIFT features are a set of keypoints, each of which have a 2-D array, and the number of keypoints are also huge.How many and how do I give them for my learning algorithm which typically accepts only one-d features?
-
you may look this answer http://stackoverflow.com/a/36795946/1197132 – Halis Yılboğa Apr 22 '16 at 14:15
4 Answers
You can represent single SIFT as "visual word" which is one number and use it as SVM input, I think it is what you need. It is usually done by k-means clustering.
This method is called "bag-of-words" and described in this paper.

- 2,799
- 2
- 28
- 40
You should read the original paper about SIFT, it tells you what is SIFT and how to use it, you should carefully read the chapter 7 and rest for understanding how to use it practically. Here is the link for original paper.

- 5,210
- 2
- 29
- 61
-
my issue is not with SIFT per se, i understand that the output is a set of keypoints each of which has 128 values. Even if i consider taking top 10 key points, the number of values i have to deal with is 128 x 10 which is huge, so is there anyway to apply dimensionality reduction or something for that keypoints? – sreeraag Nov 18 '13 at 08:41
You can use the Bag of Words approach, of which you can read about in the following post:
http://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/

- 2,117
- 5
- 22
- 38
Sift and Surf are invariant feature extractors. There for matching features will help solving lots of problems.
But there is matching problem since all points may not be same in two different image. (and in the case of similarity problem). Therefore you should use the features which is matched the others may.
Another problem is this algorithms extract lots of features which is not possible to match in large datasets.
There is a good solution to those problems which is called "Bag of Visual Word"
https://github.com/dermotte/LIRE complete bag of visual word is fully implemented. Here is the lire Demo site.
Code is very simple if you know the bag of visual word you can modify also.
After getting visual word you should use information retrieval approaches used in search engines. By the way Lire also include an information retrieval library called lucene. You should fallow the lire way until you get the complete idea and implement your own.

- 732,580
- 175
- 1,330
- 1,459

- 850
- 10
- 12