0

Say I have calculated the euclidean distance between two images using colour as a feature and also calculated the distance between the two images using their edges. I want to test to see if combining these two distance values will give a better representation of how similar the images are. To combine these two distance measures is it as simple as colourDistance + edgeDistance / 2? Or is there a more sophisticated way of combing distance values?

Bad Dub
  • 1,503
  • 2
  • 22
  • 52
  • It would help if you explained the application. One thing you probably want to do (though it does depend on the application) is normalize the two distance metrics before you combine them. – daphshez May 03 '15 at 11:32
  • Its a CBIR system, user uploads and image and the top 12 most similar images are returned. Right now I use euclidean distance to to compare colour histograms to get the distance. I then have another histogram to capture edge values and get the distances for it. All images are 200x200 so do I still need to normalize them? I quantise them for the colour histogram. So I thought I could just add the colour and edge distances and divide by 2 to get a new distance value. But I think it could be a lot more complicated than just adding them and dividing by 2. – Bad Dub May 03 '15 at 11:36

1 Answers1

1

Any function of colourDistance and edgeDistance could work. You could think of what you described as testing three possible functions:

f1(colourDistance, edgeDistance) = colourDistance
f2(colourDistance, edgeDistance) = edgeDistance
f3(colourDistance, edgeDistance) = (colourDistance + edgeDistance) / 2

You could, in theory, test any other function. One thing that comes immediately to mind is linear combinations:

g(colourDistance, edgeDistance) = w1 * colourDistance + w2 * edgeDistance

For various values of w1, w2. This will allow you to experiment with the visual importance of the two features. Your f3 is one case of this function, with w1=w2=0.5

You might found out that the weight of the features isn't linear, for example, a 1-point difference for very small values is much more (or less) significant than a 1-point difference for large values. You could try functions like:

h(colourDistance, edgeDistance) = w1 * log(colourDistance) + w2 * log(edgeDistance)

Final advice, it's not clear to me if the distances you have are on the same scale. If one distance metric goes from 0-10 and the other from 0-1000, you probably need to either normalize the values, or compensate by the choice of w1 and w2.

TylerH
  • 20,799
  • 66
  • 75
  • 101
daphshez
  • 9,272
  • 11
  • 47
  • 65
  • Thank you for the reply! This was really helpful. Im going to run a few experiments and use your weight suggestion. The way the euclidean distance works is that if the two images histograms are similar then it will return a value of 0, the more the value increases the less similar the histograms are. With that being the case do I need to normalize them? – Bad Dub May 03 '15 at 11:59
  • It depend on the size of the range. Both features start at 0, but how high do the go? – daphshez May 03 '15 at 12:24
  • I dont actually know, The colour histograms are 256 in size and the edge are only 80 but they are being compared against each respectively other so I thought size wouldnt matter. – Bad Dub May 03 '15 at 12:44
  • I guess you can try playing with the weights first, to keep things simple. – daphshez May 03 '15 at 16:17