1

I have around 500 files that I need to delete duplicates from. I am using a Powershell script to run the sha256 hash on each of them for further processing. But it occurred to me that with the size of some files, the number of files, and a relatively slow machine, execution time would be a big problem.

Out of the supported hash functions (SHA1, SHA256, SHA384, SHA512, MD5), which would be the best choice for both lowering execution time and minimizing hash collision probability?

Edit: Based on the fact that any false positives resulting from collisions can easily be remedied by looking at the images, I have decided on sha1. Thanks for your help, all.

  • 1
    Don't ask us. Try them all and measure with a `Stopwatch`. You want long hashes that are fast. Measure the speed of computing different size hashes for each algorithm and put it on a graph. It should then be clear which is the winner. – Enigmativity Jul 21 '21 at 22:52
  • 3
    Remember that using file size as your first rejection criteria will save huge amounts of time. – Tim Roberts Jul 21 '21 at 22:56

0 Answers0