Use of an ID in databases

Question

When creating a new table in a database, what is the importance of using id. For our purposes, we are using unique username and email in each field to match the info with that unique username or email. So what is the use of id?

Also, what is the length/value field for? New to this.

thanks a bunch!

Mark Byers · Answer 1 · 2010-12-17T08:46:21.680

8

The id field is an example of a surrogate key. It is a good idea to use a surrogate key as a primary key in a database because it is totally unrelated to and therefore unaffected by external events in the real world.

Using a natural key such as the email address could cause problems because if a user changes their email address your key will have to change. This can create difficulties as it will break foreign key contraints. It will also make querying for events relating to a specific user over time more difficult as you have no guaranteed single key that is consistent for that user's entire history.

If you have more than one database in your company that needs the keys, or you export data from your database to other applications or systems then when you change a key in your database you may also need to change the keys in those systems too, something which cannot be done automatically by using ON CASCADE UPDATE.

edited Dec 17 '10 at 08:46

answered Dec 17 '10 at 00:01

Mark Byers

811,555
193
1,581
1,452

+1: This has happened to me once, we used the email as the key for a user database, which meant that users couldn't change their username, unless we did a lot of work of tracing all dependencies in code. Management decided it wasn't worth it and they stuck to the policy that you couldn't change usernames, you'd have to create a new account (or just use the old email address) – Ruan Mendes Dec 17 '10 at 00:05
So in that case, i can also use a new field and make it all unique numbers and i can use that as a unique marker for each user. – AAA Dec 17 '10 at 00:06
@Juan yes. The code i am using right now is something i wrote a while back. I was in the same situation, except i wrote code to update each database if a username or email changed. Which i know realize was time consuming and stupid. Well just learning. You guys are helpful. – AAA Dec 17 '10 at 00:07
All well-designed database systems implement 'ON UPDATE CASCADE' for primary key fields, which will cascade the update of a modified primary key (in the context of a transaction) to all fields which reference it. Remember, DB servers provide data integrity so that our applications do not have to worry about solving that problem. – Michael Trausch Dec 17 '10 at 00:39

score 5 · Accepted Answer · answered Dec 17 '10 at 00:33

As others have pointed out, there are two types of keys for records: natural keys and surrogate (artificial) keys. The two major questions, then, are: do you need to use a surrogate key, and if so, what should that surrogate key be?

As to the first question: You only need to use a surrogate key if you have no valid natural key for use as a primary key on the table. All sane database systems support the 'ON UPDATE CASCADE' clause, which means that if you are using a natural key which happens to change, the change will be propagated to everything which is declared to reference it. Of course, if your database system does not support foreign keys, then your best bet is to use a surrogate key, if only to work around the lack of functionality in the database system (and surrogate keys will make your database easier to consistency check in light of that fact). That said, if you are designing an application that has requirements for high uptime and high robustness, select a database implementation that gets foreign keys correct, or you will most likely find that data integrity bugs will be found late in development (or even in maintenance) and you will have to write utilities that will check your data for consistency in various modes of failure.

For the second question: If you use a surrogate key, especially if you are working around a deficiency of a database system, you should always treat it as if it were immutable and globally unique. ALWAYS. This will aid in many situations later on: companies can merge (and split), databases can be merged (and split), and about a million other situations can happen that aren't anticipated when the database is designed that are capable of causing problems if the surrogate keys are not globally unique. Since surrogate keys are not at all related to the data they hold (they have no relation to the other fields in the table other than the artificial one that you have bestowed upon it) it's just best that way. For these reasons, when I must use a surrogate key, I use a UUID (which is essentially a 128-bit integer, but not incremental). Now you don't have to worry about renumbering record numbers and references when unexpected events occur. (Yes, it does slow things down, particularly if your server is running on a 32-bit platform. But if you need to handle more load, distribute the load better---do not sacrifice integrity for speed, ever, when you're working with important data!)

+1 for the excellent explanation and because I did not even know about the `ON UPDATE CASCADE`. Anyway this is the typical thing I would be really scared to rely on. Guess what happens if on many tables the `ON UPDATE CASCADE` fails on some of them because of a DB bug or whatever else and the DB is not able to revert the transaction, I bet you would forever regreat not to have added a small immutable surrogate key from the begining of your DB design that would have costed you nothing in terms of space. — Marco Demaio, Mar 30 '11 at 16:40
True, though this would be the sort of thing that you would want to test *extensively* to be sure that it works. A database engine that is inherently transactional, such as PostgreSQL, shouldn't have an issue. I'm not sure that I'd trust MySQL with such a complex situation, though, being that not every table in a database _has_ to be transactional. — Michael Trausch, Apr 17 '11 at 01:23
As my personal opinion, I would also add that, if you really need such a surrogate key, you won't call it `id`. Rather it would be `id_user`, for example; and every FK column should have this name whenever possible. — ShinTakezou, Feb 19 '22 at 12:01

score 3 · Answer 3 · answered Dec 17 '10 at 00:01

3

Relations between tables.

Is uneffective have relation to username or email address because this is a string and comparing this values takes much more time, and indexes are bigger, optimal solution is add ID like a primary key for relations to other tables as userid.

answered Dec 17 '10 at 00:01

Svisstack

16,203
6
66
100

Strings can be indexed and hashed; for == comparisons this is nearly equal to the speed of integer comparisons. That said, remember that database design isn't about being aware of the internal properties of the database server implementation so much as it is about modeling your data accurately and robustly. – Michael Trausch Dec 17 '10 at 00:35
All tables with similar necessities will have a column `id`. I don't think this is a good idea. The user table should have `userid` as well. The fact that in one table it's PK and in the others it is a FK, clarifies the role/relationship. – ShinTakezou Feb 19 '22 at 12:03

score 0 · Answer 4 · answered Dec 17 '10 at 00:04

0

Although you want usernames to be unique you should not rely on your database to control uniqueness. It is best practice for your code to test if the username and e-mail already exists in the database.

The purpose of length is for limiting the input of data. For instance varchar with a length of 10 will only allow a 10 character length input. Value is for default purposes. If you insert a new row without declaring this field it will automatically be filled with the value, if set.

answered Dec 17 '10 at 00:04

Trevor

11,269
2
33
40

So, for the unique id, since they have to be unique, can i use int value as 100? – AAA Dec 17 '10 at 00:14
2

Actually, you should never rely upon your application code to control either uniqueness or data integrity. If you are going to do that, then there is not much point in using a database system that has those features available for you. We use database servers because they are a generic solution to a frequent problem, and are much better tested than the new code we just wrote to run with that system... – Michael Trausch Dec 17 '10 at 00:36
@Michael: Fair enough. However, the usage of a unique ID is useful for relating to usernames/emails or other tables. :) – Trevor Dec 20 '10 at 18:56

score 0 · Answer 5 · answered Dec 17 '10 at 00:06

The point is to lighten the index. If you don't use an identity field and you choose to use (username, email) as primary key, the search for a user will take some more time looking for the username firt and then the email. Furthermore, these fields are strings, you can't compare the size of an integer and the size of strings like usernames and emails.

And using an identity field will allow you to do things like Comments (id, user_id) and not Comments (id, username, email)...

Use of an ID in databases

5 Answers5

Linked