13

I'm looking to create a 32-bit hash of some data objects. Since I don't feel like writing my own hash function and md5 is available, my current approach is to use the first 32 bits (i.e. first 8 hex digits) from an md5 hash. Is this acceptable?

In other words, are the first 32 bits of an md5 hash just as "random" as any other substring? Or is there any reason I'd prefer, say, the last 32 bits? or perhaps XOR'ing the four 32-bit substrings together?

Some preemptive clarifications:

  • These hashes don't need to be cryptographically secure.
  • I'm not concerned with the performance of md5--it is more than fast enough for my needs.
  • These hashes just need to be "random" enough that collisions are rare.
  • In this system, the number of items shouldn't exceed 10,000 (realistically it's probably not going to get half that high). So in the worst case the probability of encountering any collisions at all should be about 1% (assuming a sufficiently "random" hash is found).
Kip
  • 107,154
  • 87
  • 232
  • 265
  • do you already have an MD5 hash computed? (e.g. as part of the metadata of a Subversion checkin) or do you have to compute the MD5 hash yourself? If the latter, I agree w/ @Johannes' comment, CRC32 would be much simpler. – Jason S May 13 '09 at 21:10
  • 2
    Apparently there is no way on SO to preemptively address the "your question is invalid because you should do it this way instead" comments... – Kip May 13 '09 at 21:14
  • Sorry, I didn't mean *don't* use an MD5 hash, I just mean a CRC32 is simpler. You or your customers are the only ones that can judge what algorithms meet your requirements. – Jason S May 13 '09 at 21:18
  • 1
    I don't know whether you knew about this already, but 1% chance of a collision with 10,000 entries is in fact pretty much exactly what you'd expect with a 32-bit hash--see http://en.wikipedia.org/wiki/Birthday_problem – mjs Oct 05 '09 at 10:29
  • @mjs: yes, i'm aware, that's where i got the number from. :) – Kip Oct 05 '09 at 13:01
  • 2
    You might find [this](https://stackoverflow.com/questions/69715151/are-128-bits-of-sha-1-hash-safer-than-an-md5-hash/69719059#69719059) helpful. It empirically shows that both MD5 and SHA-1 are "random enough", so you could expect that realistically for 10,000 entries with a 32-bit portion of the hash, there will be like 1-2% chance for a collision. – at54321 Oct 26 '21 at 07:39

3 Answers3

11

For any good hash function the individual bits should be approximately random. You should therefore be safe to use just the first 32 bits of an MD5 hash.

Alternatively you could also use CRC32 which should be much faster to compute (and the code is about 20 lines).

Joey
  • 344,408
  • 85
  • 689
  • 683
  • "I'm not concerned with the performance of md5--it is more than fast enough for my needs." – Kip May 13 '09 at 21:08
  • 3
    Kip: performance or not, CRC32 gives you a 32 bit hash, which is exactly what you want. – dwc May 13 '09 at 21:12
9

In other words, are the first 32 bits of an md5 hash just as "random" as any other substring?

Yes. If the answer were no, MD5 wouldn't be sufficiently secure. (sure, it has some minor cryptographic weaknesses but I'm not aware of any statistical ones)

Jason S
  • 184,598
  • 164
  • 608
  • 970
  • MD5 _isn't_ sufficiently secure as numerous attacks have shown :) – Joey May 13 '09 at 21:04
  • 5
    That statement is only true if qualifications are added. It is not sufficiently secure to make all collision attacks infeasible. It is (so far) sufficiently secure to make preimage attacks infeasible. see http://www.vpnc.org/hash.html – Jason S May 13 '09 at 21:07
  • also not to quibble, but my post didn't say MD5 was sufficiently secure. :-) – Jason S May 13 '09 at 21:08
-1

An old question here but it comes up often. The answer is most certainly NO, otherwise an MD5 string wouldn't need to be more than 32 bits long.

Regardless, an MD5 string isn't random at all - it's entirely and consistently reproducible given the same input (which is pretty much the anti-random ;-)).

Whether or not it is sufficiently unique for your purposes depends on your purpose.

Mitchell V
  • 837
  • 1
  • 7
  • 13