0

I use example code to compare HSV histograms using EMD.

I want to find similar images in people's (mobile) picture library. It's quite common that people take several images of the same subject (in a row) with just slight changes: zooming in/out a bit, different angle, different exposure as a result of changing position, other pose, ....

I selected 4 sets of 4 similar images to test this algorithm. When comparing the images inside the sets, I get 22 EMD-L1 values between roughly 0.25 and 2.25 (average 1.47) and 2 outliers around 7.2.

When I cross-comparing between sets I get values between 2 and 15 with an average around 8.

Yes, there is a significant range difference between the two result sets. But I was disappointed that there was no (gap) between these ranges, and instead a small overlap [2.0, 2.25]. I'm hoping to improve the algorithm.

How can I optimise my comparison for my particular use-case? There are various histogram forms, various histogram comparison algorithms, and then each has various parameters.

Does OpenCV implement the fastest known EMD algorithm? I was surprised that the comparison of some histograms took up to a second; especially with the relatively small bin numbers.

Then, some cross-comparisons give good EMD results, but have totally different RGB histograms. Here are two images:

enter image description here

enter image description here

My current EMD-L1 says 1.95, but the RGB histograms are totally different.

meaning-matters
  • 21,929
  • 10
  • 82
  • 142

1 Answers1

0

Probably you've already refined your comparison method. But this might not be obvious, you could divide the image into overlapping subregions, and then compute the EMD for all 4 parts.

fireant
  • 14,080
  • 4
  • 39
  • 48