4

SHA-1 is considered safer than MD5 for at least two reasons: bigger hash (160 bits vs 128 bits) and better hash function.

I need to generate hashes on a few million strings. Generally the chance for a collision even for MD5 should be very low. I am aware that MD5 is pretty old and not considered secure in general, but in my case deliberate attacks are not a concern (no outside access, no incentive, etc.). I just need a reasonably safe hash function w/o wasting too many bits and 128 bits should be more than enough. So I was wondering - if I only got the first 128 bits of a SHA-1 hash, would that be better than the 128 bits of MD5? By "better" I mean "less likely to cause a collision".

at54321
  • 8,726
  • 26
  • 46
  • 2
    If MD5 is "good enough" for your particular use case - go for it! If you can afford to use SHA-1 (160 bits) or SHA-256: great! But truncating your SHA-1 hash? Probably not a good idea. Look here: https://stackoverflow.com/a/4784437/421195 or here: [Project HashClash](https://marc-stevens.nl/p/hashclash/) – paulsm4 Oct 25 '21 at 22:02
  • 1
    No, for non-security hash table/map applications SHA1 truncated to 128 bits is no better than MD5 or for that matter MD4. They are all nuclear overkill for that application. On the other hand, if hash flooding attacks are a possibility SipHash is something to look into. – President James K. Polk Oct 25 '21 at 23:58

1 Answers1

4

I ran a few "real-world" tests with 4,292,907 different strings. I used a 11-char long substring of the HEXed hash (in other words, a 44-bit portion). Example:

  HASH: 629a09633488e9b2aaf2f5a606706da3
Test 1: 629a0963348
Test 2:  29a09633488
Test 3:   9a09633488e
...

Theoretically, I calculated the probability of collision to be ~41% (based on the "birthday paradox probability" formula). But that was theory, which assumes real random distribution. So I wanted to empirically test both MD5 and SHA-1 and see the results. Here they are (the numbers on the right show the number of collisions):

[MD5]             [SHA-1]
Test No  1: 2     Test No  1: 0
Test No  2: 0     Test No  2: 0
Test No  3: 1     Test No  3: 0
Test No  4: 0     Test No  4: 1
Test No  5: 0     Test No  5: 0
Test No  6: 0     Test No  6: 1
Test No  7: 1     Test No  7: 0
Test No  8: 2     Test No  8: 0
Test No  9: 1     Test No  9: 0
Test No 10: 1     Test No 10: 0
Test No 11: 0     Test No 11: 1
Test No 12: 0     Test No 12: 1
Test No 13: 0     Test No 13: 0
Test No 14: 0     Test No 14: 1
Test No 15: 0     Test No 15: 1
Test No 16: 0     Test No 16: 1
Test No 17: 1     Test No 17: 1
Test No 18: 1     Test No 18: 1
Test No 19: 0     Test No 19: 0
Test No 20: 0     Test No 20: 1
TOTAL: 8          TOTAL: 10           // No of tests with at least 1 collision

Conclusion: Neither MD5 nor SHA-1 showed significantly worse probability of collision, compared to the "theoretical" one calculated via the "birthday paradox probability" formula. I am aware this test isn't perfect and should be taken with a grain of salt, but to me, at least, it shows that I can heavily rely on calculating the chance of collision via the "theoretical" formula w/o worrying that my calculations are too far from the truth.

at54321
  • 8,726
  • 26
  • 46