-1

I'm seeing lots of applications using hashes as surrogate keys instead of plain integers. I can't see any good reason for this kind of design.

Given most UUID implementations are just hashed timestamps, why so many database designers choose them for application-wide surrogate keys?

Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • 3
    Very well answered here: [Advantages and disadvantages of GUID / UUID database keys](http://stackoverflow.com/questions/45399/advantages-and-disadvantages-of-guid-uuid-database-keys) – Dirk Vollmar Nov 16 '10 at 13:11
  • The link provided by 0xA3 talks about truly artificial surrogates (GUIDs). My interpretation is that this thread is actually about a hash of a meaningful value in the database using MD5 rather than a system-generated surrogate. That was my assumption in writing my answer. – nvogel Nov 16 '10 at 14:32
  • @dportas: the case in your answer is indeed a good example for situations where using a hash makes sense; right now I'm looking at the database schema from a SugarCRM fork and every table has UUID style keys, the reasons for such design is driving me curious. – Paulo Scardine Nov 16 '10 at 16:27
  • @Paulo: What does a "UUID style" key have to do with a MD5 hash? – nvogel Nov 17 '10 at 00:17
  • @dportas: are UUID version 3 just MD5 hashes? – Paulo Scardine Nov 17 '10 at 14:25
  • 1
    @Paulo: yes they are. Your question reads like you are asking about MD5 hashes in general though. The fact that hashing happens to be used to generate UUIDs isn't very relevant to the question. The use of hashes in databases is a very useful technique but apparently that isn't what you wanted to know. – nvogel Nov 17 '10 at 19:18
  • @dportas: thanks anyway, your answer and comments are +1 – Paulo Scardine Nov 17 '10 at 22:38

3 Answers3

4

A hash allows more efficient comparisons between potentially large data values - in joins for example. i.e. the comparison of HASH(LargeObjectA)=HASH(LargeObjectB). If the hashed values are documents in a table of a document management system for example then it may be more efficient to compare hashes than documents.

Most DBMSs have limits on the storage size of a key, so a hash may be one alternative workaround for implementing larger keys.

Hashes can also be used to optimise storage by splitting data into logical partitions that are evenly distributed across a data set.

nvogel
  • 24,981
  • 1
  • 44
  • 82
2

If the data backend for an application is made out of multiple distributed databases, using incremented integer ids might lead to duplicated values. UUIDs are guaranteed to be unique not only inside the application but outside it as well (which might be helpful when joining with external data).

It is true that using different id seeds for the different databases in the system would solve the uniqueness problem for integers, but managing such an approach would be more difficult.

Florin Dumitrescu
  • 8,182
  • 4
  • 33
  • 29
1

Uniqueness across servers? Using plain integers wouldn't work well in that situation.

paulbailey
  • 5,328
  • 22
  • 35