2

GOAL
Finding a good way to check if 2 image are similar compairing their hash profiles. The hash is a simple array containing 0 and 1 values.

INTRO
I have 2 images. They are the same image but with some little differences: one has a different brightness, rotation and shot.
What I want to do is create a Javascript method to compare the 2 images and calculate a percentage value that tells how much they are similar.

WHAT I'VE DONE
After uploading the 2 images into a html5 canvas to get their image data, I've used the pHash algorithm (www.phash.org) to obtain their hash rapresentation.
The hash is an array containing 0 and 1 values that recreates the image in a "simplified" form.
I've also created a JS script that generates a html table with black cells where the array contains 1.
The result is the following screenshot (the image is a Van Gogh picture):

Screenshot

Now, what I should do is to compare the 2 arrays for obtaining a percentage value to know "how much" they are similar.
The most part of the hash Javascript algorithms I've found googling already have a compare algorithm: the hamming distance algorithm. It's very simple and fast, but not very precise. In fact, the hamming distance algorithm says that the 2 images in my screenshot have a 67% of similarity.

THE QUESTION
Starting with 2 simple arrays, with the same length, filled with 0 and 1 values: what could be a good algorithm to determine similarity more precisely?

NOTES
- Pure Javascript development, no third party plugins or framework.
- No need of a complex algorithm to find the right similarity when the 2 images are the same but they are very different (strong rotation, totaly different colors, etc.).

Thanx

PHASH CODE

  // Size is the image size (for example 128px)
  var pixels = [];

  for (var i=0;i<imgData.data.length;i+=4){
   
      var j = (i==0) ? 0 : i/4;
   var y = Math.floor(j/size);
   var x = j-(y*size);   
   
   var pixelPos = x + (y*size);
   var r = imgData.data[i];
   var g = imgData.data[i+1];
   var b = imgData.data[i+2];

   var gs = Math.floor((r*0.299)+(g*0.587)+(b*0.114));
   pixels[pixelPos] = gs;
      
  }

  var avg = Math.floor( array_sum(pixels) / pixels.length );
  var hash = [];
  array.forEach(pixels, function(px,i){
    if(px > avg){
      hash[i] = 1;
    } else{
      hash[i] = 0;
    }
  });

  return hash;

HAMMING DISTANCE CODE

  // hash1 and hash2 are the arrays of the "coded" images.
  
  var similarity = hash1.length;
  
  array.forEach(hash1, function(val,key){
    if(hash1[key] != hash2[key]){
      similarity--;
    }
  });

  var percentage = (similarity/hash1.length*100).toFixed(2);

NOTE: array.forEach is not pure javascript. Consider it as a replace of: for (var i = 0; i < array.length; i++).

  • 2
    please add the code you have. – Nina Scholz Jul 25 '17 at 08:20
  • What do you consider to be "imprecise" about the Hamming distance? It seems like the right tool for the job. Do you have an example of where the Hamming distance isn't working well for you? – user94559 Jul 25 '17 at 08:24
  • If your hamming distance seem imprecise, that's more likely that your hashing is not precise enough... Hamming distance cannot be imprecise, it perfectly binary. – Salketer Jul 25 '17 at 08:28
  • pHash.org is a library of perceptual Hash algorithms, which one are you using? And how? – Salketer Jul 25 '17 at 08:29
  • We need to see your hashing algorithm – Salketer Jul 25 '17 at 08:39
  • The Hamming distance algorithm works. The pHash algorithm I used is just one I found googling. In general, they are always the same implementation. What I was asking is if someone can suggest a compare algorithm a little bit better than Hamming. – Stevenworks Pictures Jul 25 '17 at 08:40
  • does this work? `array.forEach(hash1, function(val,key){ ...` is it plain javascript? – Nina Scholz Jul 25 '17 at 08:48

2 Answers2

1

I'm using blockhash, it seems pretty good so far, only false positives I get are when half the pictures are of the same background color, which is to be expected =/

http://blockhash.io/

BlockHash may be slower than yours but it should be more accurate.

What you do is just calculate the greyscale of EACH pixels, and just compare it to the average to create your hash.

What BlockHash does is split the picture in small rectangles of equal size and averages the sum of the RGB values of the pixels inside them and compares them to 4 horizontal medians.

So it is normal that it takes longer, but it is still pretty efficient and accurate.

I'm doing it with pictures of a good resolution, at minimum 1000x800, and use 16bits. This gives a 64 character long hexadecimal hash. When using the hamming distance provided by the same library, I see good results when using a 10 similarity threshold.

Your idea of using greyscale isn't bad at all. But you should average out portions of the image instead of comparing each pixels. That way you can compare a thumbnail version to its original, and get pretty much the same phash!

Salketer
  • 14,263
  • 2
  • 30
  • 58
  • Just for curiosity I've implemented the BlockHash algorithm. The difference in the results respect to my script is minimal, just a 3% of difference. The notable thing is that my algorithm is a lot faster than BlockHash. – Stevenworks Pictures Jul 25 '17 at 13:08
  • Here, added some info to help you out... Can't do much more unfortunately. – Salketer Jul 25 '17 at 13:56
0

I don't know if this can do the trick, but you can just compare the 0 and 1 similarities between arrays :

const arr1 = [1,1,1,1,1,1,1,1,1,1],
      arr2 = [0,0,0,0,0,0,0,0,0,0],
      arr3 = [0,1,0,1,0,1,0,1,0,1],
      arr4 = [1,1,1,0,1,1,1,0,1,1]

const howSimilar = (a1,a2) => {
    let similarity = 0
    a1.forEach( (elem,index) => {
        if(a2[index]==elem) similarity++
    })
    let percentage = parseInt(similarity/arr1.length*100) + "%"
    console.log(percentage)
}

howSimilar(arr1,arr2) // 0%
howSimilar(arr1,arr3) // 50%
howSimilar(arr1,arr4) // 80%
Jeremy Thille
  • 26,047
  • 12
  • 43
  • 63