53

We're designing a table for user entity. The only non-trivial requirement is that there should be a permanent URL to the user entity (for example their profile). There's a lot about int/long vs UUID on the web. But it is still unclear to me.

  1. Considering the fact that the profile contains private information, it's not a good idea to have a predictable ID embedded in the URL. Am I right?
  2. To satisfy the first I can have primary key as UUID and embed it in the URL. But there's two question. Should I be worried about the performance penalty of having UUID as primary key in anyway; indexing, inserting, selecting, joining?

Having that said, which one of the following is better (with respect to the above)?

CREATE TABLE users(
  pk UUID NOT NULL,
  .....
  PRIMARY KEY(pk)
);

or

CREATE TABLE users(
  pk INT NOT NULL AUTO_INCREMENT,
  id UUID NOT NULL,
  .....
  PRIMARY KEY(pk),
  UNIQUE(id)
);
Rad
  • 4,292
  • 8
  • 33
  • 71

6 Answers6

63

It's a matter of choice actually and this question can raise opinion based answers from my point of view. What I always do, even if it's redundant is I create primary key on auto increment column (I call it technical key) to keep it consistent within the database, allow for "primary key" to change in case something went wrong at design phase and also allow for less space to be consumed in case that key is being pointed to by foreign key constraint in any other table and also I make the candidate key unique and not null.

Technical key is something you don't normally show to end users, unless you decide to. This can be the same for other technical columns that you're keeping only at database level for any purpose you may need like modify date, create date, version, user who changed the record and more.

In this case I would go for your second option, but slightly modified:

CREATE TABLE users(
  pk INT NOT NULL AUTO_INCREMENT,
  id UUID NOT NULL,
  .....
  PRIMARY KEY(pk),
  UNIQUE(id)
);
Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • 2
    @Kamil, And when there is a relation, auto-inc should be used as the FK? But does not this mean there will extra join for simple queries? For example 1-to-m customer -payments relation, would mean to get payments for a customerKey, we would join payments on customers using the auto-inc where customerKey = key from req, instead of just querying the payment table where customerKey = key from request. – Mu-Majid Jul 17 '21 at 15:04
  • @Mu-Majid. Since this question is now a bit older, but I asked myself exactly the same as you, have you meanwhile found a way fitting with your needs or did you use this approach? – Vito Mar 10 '23 at 19:49
39

This question is quite opinion-based so here's mine.

My take is to use the second one, a separate UUID from the PK. The thing is:

  • The PK is unique and not exposed to the public.
  • The UUID is unique and may get exposed to the public.

If, for any reason, the UUID gets compromised, you'll need to change it. Changing a PK may be expensive and has a lot of side effects. If the UUID is separate from the PK, then its change (though not trivial) has far less consequences.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • 4
    Nor numerical IDs or UUIDs should be secret. Security shouldn't be based on unguessable IDs, and access should be checked anyway. – ymajoros Apr 26 '21 at 09:03
  • 8
    @ymajoros Not all resources are fully private. An example is something accessible by "anyone with the link". Google does this with things like Docs and Sheets. In this case auto incrementing IDs should be keep secret to prevent fusking style attacks to discover documents. UUIDs would be great in this case since there's no real pattern to guess at, thus they're time consuming to find. They therefore offer an acceptable layer of protection while remaining easily accessible. Security isn't simply on/off. Security is always levels of compromise that include various risks and usability tradeoffs. – Joel Mellon Sep 05 '21 at 01:43
  • 1
    @JoelMellon the url isn't where the security is. What makes it secure is the combination of google authentication + whatever access you gave. Of course, if you make documents public, you just get that, but nobody is forcing you to. Using incremental IDs wouldn't decrease security, but would make the fact that public documents are public just clearer. – ymajoros Sep 06 '21 at 06:56
  • 3
    I think what @JoelMellon was trying to say is that for some reason, you might not want external users to somehow determine the amount of say transaction records you have in your system, because they are publicly accessible by sequenced numerical IDs. Fine they are public, but no one can know for certain how many of that resource you have. – francis94c Sep 26 '21 at 03:37
  • @ymajoros using incremental IDs would make it harder to mass-scan for documents no? – El Mac Apr 20 '22 at 17:24
12

Using UUID as pk: The first problem is, UUID takes 9x storage than int. 2nd problem is, if you need sorting by pk more frequently, don't even think about UUID. UUID as pk doesn't affect the time complexity for where condition or others except sort.

Using int as pk: Easily guessable. Brute force attacker will love this. this is the only problem but biggest one.

Using int as pk but, keeping UUID as well: If the UUID is not pk then the time complexity will be increased for searching by UUID. even though, all the relations will be maintained by int, but, when you will search by UUID, it will take time. As the relations are on int, the 9x storage issue is solved here.

Amimul Ehshan
  • 192
  • 1
  • 12
  • 7
    No one will *ever* sort by UUID. The only filter will ever be `where user.uuid = some_uuid`. Regarding index selection (where), I'm not convinced UUID will be slower, since all values will have perfect distribution across the table. autoincrements will have poor distribution -- all your most recent records will be bunched together, degrading index performance. For storage size, a UUID is only twice the size of a bigint. – David Hempy May 23 '22 at 16:40
  • UUID is 128-bit long, which is double the size of a BIGINT and four times the size of the INTEGER types. Also having UUID as primary key may result in performance issues, especially with a million of rows joins across multiple tables. – Rodolfo Maayos Jul 20 '23 at 01:05
11

I came across a nice article that explains both pros and cons of using UUID as a primary key. In the end, it suggests using both but Incremental integer for PK and UUIDs for the outside world. Never expose your PK to the outside.

One solution used in several different contexts that has worked for me is, in short, to use both. (Please note: not a good solution — see note about response to original post below). Internally, let the database manage data relationships with small, efficient, numeric sequential keys, whether int or bigint. Then add a column populated with a UUID (perhaps as a trigger on insert). Within the scope of the database itself, relationships can be managed using the usual PKs and FKs.

But when a reference to the data needs to be exposed to the outside world, even when “outside” means another internal system, they must rely only on the UUID. This way, if you ever do have to change your internal primary keys, you can be sure it’s scoped only to one database. (Note: this is just plain wrong, as Chris observed)

We used this strategy at a different company for customer data, just to avoid the “guessable” problem. (Note: avoid is different than prevent, see below).

In another case, we would generate a “slug” of text (e.g. in blog posts like this one) that would make the URL a little more human friendly. If we had a duplicate, we would just append a hashed value.

Even as a “secondary primary key”, using a naive use of UUIDs in string form is wrong: use the built-in database mechanisms as values are stored as 8-byte integers, I would expect.

Use integers because they are efficient. Use the database implementation of UUIDs in addition for any external reference to obfuscate.

https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439

Sarvar Nishonboyev
  • 12,262
  • 10
  • 69
  • 70
7

Don’t make it your database primary key: this will cause problems in the future of you want to change your database technology. And if you make it an increasing number, your competitors will know how many users you have and how fast you are adding new ones.

vy32
  • 28,461
  • 37
  • 122
  • 246
4

The rule of thumb is to keep a clear separation between:

  • a business value (even some UUID being a representation)

and

  • a technical value (like a primary key)

For example, if you'd like to use a mapping to some record by its id, such mapping is a business value so to keep the above separation, you will need a dedicated field (like UUID) instead of a technical primary key.

Sławomir Lenart
  • 7,543
  • 4
  • 45
  • 61