Perceptual hashing accuracy/precision

Question

I want to find identical and very similar images within a truckload of photos. To do this, I want to compare the Levenstein (or Hamming, not decided yet) distances of their perceptual hashes. To calculate these, I want to use imghash (also not a final decision). For output, imghash allows to select output format and number of bits. I assume that changing the number of bits changes accuracy/precision, but does it really? By default, the output is a 16-character hex string (Eighteen Quintillion Four Hundred Forty-Six Quadrillion.. combinations). Seems like an overkill. But is it? And if so, what is the reasonable length?

score 0 · Answer 1 · answered Sep 10 '22 at 10:14

When using imghash and hamming-distance to calc similarity of images, it goes like this:

imgHash accepts [,bits] as an optional argument, which is 8 by default. Longer hash does mean greater accuracy: For 'very similar' images I tested this with, their 4-bit hashes were same, but 8-bit hashes differ.
The maximum hamming distance (when images are completely different - black vs. white canvas) equals to hash length ^2. Accordingly, you need to adjust your selected threshold for image similarity.

Also:

The selected bit length must be divisible by 4.
When comparing the perceptual hashes, these need to be the same length.

Perceptual hashing accuracy/precision

1 Answers1