1

I've read something about Fisher Vector and I'm still in the learning process. It's a better representation than the classic BoF representation, exploiting GMM (or k-means, even if that's usually referred as VLAD).

However, I've seen that usually they are used for classification problem, for example with SVM.

But what about Image Retrieval? I've seen that they have been used for image retrieval too (here), but I don't understand one point: given two FV representing 2 images, how do we compute their distances and so "how similar the two images are?"

Is it reasonable to use them in such a context?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138
  • Note that access to the document requires registering -- a gentle requirement, but still a surprise. – Prune Jun 27 '16 at 18:13

1 Answers1

1

As seen in the two papers below, Euclidean distance seems to be the popular choice. There are also references to using dot-product as a similarity measure; cosine similarity (closely related) is a generally popular metric for ML similarity.

http://link.springer.com/article/10.1007/s11263-013-0636-x

http://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13/simonyan13.pdf

Is this enough to let you choose something that meets your needs?

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Thank so much, it's defenetely something that meets my needs :) Since I'm implementing a general framework where approximate similar items are found through LSH, do you think that's an acceptable solution to find similar FV? Usually I know that inverted index are used, but here I'm using not more than 50k MAYBE 100k images, not 100M as in the papers where these solutions are described. – justHelloWorld Jun 27 '16 at 19:24
  • In particular the state of the art FALCONN which is designed for cosine similarity and L2 distances would be perfect according to your answer. – justHelloWorld Jun 27 '16 at 19:27
  • Yes. Since LSH specifically preserves a sense of similarity, any of the straightforward methods should serve you well. In fact, since you have so few images (by my standards :-) ), you might try both cosine and L2; compare the results and see which serves you better. – Prune Jun 27 '16 at 21:05