4

My question is concerning an idea I had, where I could check if an image has already been uploaded by comparing their base64-encoded strings...

Example use-case would be to find duplicates in your database...

The operation would be pretty big i guess - first converting the image to base64and then using something like "strcmp()" to compare..

Not sure if this would make a lot of sense but what do you think of the idea?

Would it be too big of an operation? How accurate would it be? Does the idea make any sense?

John Conde
  • 217,595
  • 99
  • 455
  • 496
der-lukas
  • 936
  • 9
  • 13

2 Answers2

2

Here's a function that can help you compare files faster.

Aside from checking an obvious thing like file size, you can play more with comparing binary chunks.
For example, check the last n bytes as well as a chunk of a random offset.

I used checksum comparison as a last resort.

When optimizing check order, you can also take into account if you are generally expecting files to be different or not.

function areEqual($firstPath, $secondPath, $chunkSize = 500){

    // First check if file are not the same size as the fastest method
    if(filesize($firstPath) !== filesize($secondPath)){
        return false;
    }

    // Compare the first ${chunkSize} bytes
    // This is fast and binary files will most likely be different 
    $fp1 = fopen($firstPath, 'r');
    $fp2 = fopen($secondPath, 'r');
    $chunksAreEqual = fread($fp1, $chunkSize) == fread($fp2, $chunkSize);
    fclose($fp1);
    fclose($fp2);

    if(!$chunksAreEqual){
        return false;
    }

    // Compare hashes
    // SHA1 calculates a bit faster than MD5
    $firstChecksum = sha1_file($firstPath);
    $secondChecksum = sha1_file($secondPath);
    if($firstChecksum != $secondChecksum){
        return false;
    }

    return true;
}
Ivan Batić
  • 476
  • 2
  • 8
1

If I would do something like this I would use md5 hash instead of base64_encode.

$equal = ( md5($image1) == md5($image2)) ? true : false;
gotha
  • 489
  • 1
  • 5
  • 20
  • Hm, could you explain why? – der-lukas May 07 '15 at 17:09
  • Probably `md5_file()` – AbraCadaver May 07 '15 at 17:16
  • Several megabytes encoded image will be quite big string to compare. [md5_file](http://php.net/md5_file) will be faster. If it is stored in database, its a little bit more complicated, but I think it still will be faster. I havent checked so the best solution will be to try them both and see which one performs better. – gotha May 07 '15 at 17:16
  • This worked well for comparing byte arrays (not base64 string) – Nickson Yap Oct 08 '19 at 04:11