0

I am having a problem in choosing a adequate distance function to measure the similarity (dissimilarity) between two relative frequency vectors.

More specifically, I am using shape feature vectors that contain data about the basic shapes (circle, triangle, square) present in an image. Thus the vectors are in the form

[% of circles, % of triangles, % of squares]

For example, if an image contains 4 circles, 2 triangles and 4 squares, then its shape feature vector should be:

[0.4, 0.2, 0.4]

The initial idea was to simple measure the euclidean between the corresponding elements of the two vector features and then adds the results together. However I am not convinced that this is the best approach. Can someone suggest a good approach to measure the distance between such two vectors, or suggest any algorithm for such situation? Are more sophisticated probabilistic distance functions required to obtain good results such as the Chi-Squared or the Kullback Leibler Divergence distance functions?

Thanks Peter

peterS
  • 71
  • 1
  • 2
  • 6

1 Answers1

1

What distance function to use depends on your concrete task.

I guess cosine similarity may be what you want.

alper
  • 322
  • 2
  • 14
  • Thanks for your suggestion alper. The task involves a simple image retrieval system that performs retrieval using image features. In simple terms, one of the options is for the user to choose the kind and amount of shapes that he want the image to have. If the user set the system to retrieve images that consists of 4 circles, 2 triangles and 4 squares, the system automatically create the feature vector `[0.4, 0.2, 0.4]` that is used as a query, Then the closest results are retrieved by measuring the similarity of this feature with all the shape feature vector of the image in the collection. – peterS Mar 02 '15 at 18:36