MD5 hash as artificial keys

Question

I'm seeing lots of applications using hashes as surrogate keys instead of plain integers. I can't see any good reason for this kind of design.

Given most UUID implementations are just hashed timestamps, why so many database designers choose them for application-wide surrogate keys?

Very well answered here: [Advantages and disadvantages of GUID / UUID database keys](http://stackoverflow.com/questions/45399/advantages-and-disadvantages-of-guid-uuid-database-keys) — Dirk Vollmar, Nov 16 '10 at 13:11
The link provided by 0xA3 talks about truly artificial surrogates (GUIDs). My interpretation is that this thread is actually about a hash of a meaningful value in the database using MD5 rather than a system-generated surrogate. That was my assumption in writing my answer. — nvogel, Nov 16 '10 at 14:32
@dportas: the case in your answer is indeed a good example for situations where using a hash makes sense; right now I'm looking at the database schema from a SugarCRM fork and every table has UUID style keys, the reasons for such design is driving me curious. — Paulo Scardine, Nov 16 '10 at 16:27
@Paulo: What does a "UUID style" key have to do with a MD5 hash? — nvogel, Nov 17 '10 at 00:17
@Paulo: yes they are. Your question reads like you are asking about MD5 hashes in general though. The fact that hashing happens to be used to generate UUIDs isn't very relevant to the question. The use of hashes in databases is a very useful technique but apparently that isn't what you wanted to know. — nvogel, Nov 17 '10 at 19:18

score 4 · Answer 1 · answered Nov 16 '10 at 14:28

A hash allows more efficient comparisons between potentially large data values - in joins for example. i.e. the comparison of HASH(LargeObjectA)=HASH(LargeObjectB). If the hashed values are documents in a table of a document management system for example then it may be more efficient to compare hashes than documents.

Most DBMSs have limits on the storage size of a key, so a hash may be one alternative workaround for implementing larger keys.

Hashes can also be used to optimise storage by splitting data into logical partitions that are evenly distributed across a data set.

Florin Dumitrescu · Accepted Answer · 2010-11-16T13:09:19.757

If the data backend for an application is made out of multiple distributed databases, using incremented integer ids might lead to duplicated values. UUIDs are guaranteed to be unique not only inside the application but outside it as well (which might be helpful when joining with external data).

It is true that using different id seeds for the different databases in the system would solve the uniqueness problem for integers, but managing such an approach would be more difficult.

score 1 · Answer 3 · answered Nov 16 '10 at 13:01

1

Uniqueness across servers? Using plain integers wouldn't work well in that situation.

answered Nov 16 '10 at 13:01

paulbailey

5,328
22
35

MD5 hash as artificial keys

3 Answers3