GOAL
Finding a good way to check if 2 image are similar compairing their hash profiles. The hash is a simple array containing 0 and 1 values.
INTRO
I have 2 images. They are the same image but with some little differences: one has a different brightness, rotation and shot.
What I want to do is create a Javascript method to compare the 2 images and calculate a percentage value that tells how much they are similar.
WHAT I'VE DONE
After uploading the 2 images into a html5 canvas to get their image data, I've used the pHash algorithm (www.phash.org) to obtain their hash rapresentation.
The hash is an array containing 0 and 1 values that recreates the image in a "simplified" form.
I've also created a JS script that generates a html table with black cells where the array contains 1.
The result is the following screenshot (the image is a Van Gogh picture):
Now, what I should do is to compare the 2 arrays for obtaining a percentage value to know "how much" they are similar.
The most part of the hash Javascript algorithms I've found googling already have a compare algorithm: the hamming distance algorithm. It's very simple and fast, but not very precise. In fact, the hamming distance algorithm says that the 2 images in my screenshot have a 67% of similarity.
THE QUESTION
Starting with 2 simple arrays, with the same length, filled with 0 and 1 values: what could be a good algorithm to determine similarity more precisely?
NOTES
- Pure Javascript development, no third party plugins or framework.
- No need of a complex algorithm to find the right similarity when the 2 images are the same but they are very different (strong rotation, totaly different colors, etc.).
Thanx
PHASH CODE
// Size is the image size (for example 128px)
var pixels = [];
for (var i=0;i<imgData.data.length;i+=4){
var j = (i==0) ? 0 : i/4;
var y = Math.floor(j/size);
var x = j-(y*size);
var pixelPos = x + (y*size);
var r = imgData.data[i];
var g = imgData.data[i+1];
var b = imgData.data[i+2];
var gs = Math.floor((r*0.299)+(g*0.587)+(b*0.114));
pixels[pixelPos] = gs;
}
var avg = Math.floor( array_sum(pixels) / pixels.length );
var hash = [];
array.forEach(pixels, function(px,i){
if(px > avg){
hash[i] = 1;
} else{
hash[i] = 0;
}
});
return hash;
HAMMING DISTANCE CODE
// hash1 and hash2 are the arrays of the "coded" images.
var similarity = hash1.length;
array.forEach(hash1, function(val,key){
if(hash1[key] != hash2[key]){
similarity--;
}
});
var percentage = (similarity/hash1.length*100).toFixed(2);
NOTE: array.forEach is not pure javascript. Consider it as a replace of: for (var i = 0; i < array.length; i++).