2

In my web application, the user can define documents and give them a unique name that identifies that document and a friendly name that a human will use to refer to the document. Take the following table schema as an example:

|   id   |    name        |   friendly_name   |
-----------------------------------------------
|    2   |    invoice-2   |   Invoice 2       |

In this example I've used the id column as the primary key, which is an auto incrementing number. Since there's already a natural ID for documents (name) I could also do this:

|    name        |   friendly_name   |
--------------------------------------
|    invoice-2   |   Invoice 2       |

In this example, name is the primary key of the document. We've eliminated the id field as it's essentially just a duplicate of name, since every document in the table must have a unique name anyway.

This would also mean that when I refer to a document from a foreign key relationship I'd have to call it document_name rather than document_id.

What's the best practice regarding this? Theoretically it's entirely possible for me to use a VARCHAR for the primary key, but does it come with any downsides such as performance overhead?

John Dorean
  • 3,744
  • 9
  • 51
  • 82
  • What would the advantages be? – Mihai Sep 28 '14 at 19:31
  • The advantage would be that I have one less column in the table which would otherwise just be duplicate data. – John Dorean Sep 28 '14 at 19:33
  • 1
    How would you increment that?Triggers?Problems might arise.JOINs on varchar are slower that on int.Larger size table.Larger index.Just a few things to think about.Also this might interest you http://en.wikipedia.org/wiki/Natural_key – Mihai Sep 28 '14 at 19:34
  • It won't be incremented, it'll be defined by the user. Thanks for the info on JOINs though. – John Dorean Sep 28 '14 at 20:19

2 Answers2

3

There are two schools of thought on this topic.

There are some who hold strongly to the belief that using a "natural key" as the primary key for an entity table is desirable, because it has significant advantages over a surrogate key.

The are others that believe that a "surrogate" key can provide some desirable properties which a "natural" key may not.

Let's summarize some of the most important and desirable properties of a primary key:

  • minimal - fewest possible number of attributes
  • simple - native datatypes, ideally a single column
  • available - the value will always be available when the entity is created
  • unique - absolutely no duplicates, no two rows will ever have the same value
  • anonymous - carries no hidden "meaningful" information
  • immutable - once assigned, it will never be modified

(There are some other properties that can be listed, but some of those properties can be derived from the properties above (not null, can be indexed, etc.)


I break the two schools of thought regarding "natural" and "surrogate" keys as the "best" primary keys into two camps:

1) Those who have been badly burned by an earlier decision to elect a natural key as the primary key, and

2) Those who have not yet been burned by that decision.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • 2
    You forgot the third "camp": those who successfully use natural keys when they make sense and use generated keys when needed. –  Sep 28 '14 at 22:11
  • @a_horse_with_no_name: What "makes sense" given today's specification can often turn out to be a problem when specification changes next year, when we've already integrated with another system that's now has a dependency on the immutability of a primary key in our system. A natural key rarely satisfies all three of these desirable properties simultaneously: "anonymity", "availability", and "immutability". (If it's available, we frequently find that it's not immutable. If it's available and immutable, we frequently find that it's not anonymous. – spencer7593 Sep 28 '14 at 22:25
  • @a_horse_with_no_name: several decades ago, I worked on a re-write of a system that had used a "natural key" as a primary key; in doing the entity relationship modeling, it was discovered that the natural key that was selected by the developers years ago wasn't actually unique. It was "usually" unique. The users adopted a mechanism for dealing with that, assigning a "dummy" natural key, and then when the printed report came out of the system, they made notations on the printed report, they wrote in the real natural key. When an offer was made to fix that, the users were elatedly surprised. – spencer7593 Sep 28 '14 at 22:30
  • @a_horse_with_no_name: The primary key selected by the developers "made sense" at the time the system was developed. The selected natural key was not "anonymous", it had meaning to the users, but it wasn't unique. The natural key worked as the primary key for almost all of the records in the system, it just didn't work for all of them, it turned out not to "make sense". (The purpose of the re-write wasn't to fix this problem, the users were just elated that it was possible that this issue could be fixed, as part of the re-write. – spencer7593 Sep 28 '14 at 22:38
  • 2
    I have seen too many systems where people created auto-generated ID columns without thinking and assuming that that would be enough. Very often then the unique index/constraint on the _real_ key is forgotten. Not to mention the many cases where a generated ID is added to m:n tables where the two FKs make a perfect PK. Generated IDs start to have "meaning" as well. I have seen users filing bug reports as "blocker" because generated numbers contained gaps. I do understand your point though. I just want to make clear that the generated PK is not *always* a good choice - nor is the "natural" PK. –  Sep 29 '14 at 06:06
  • @a_horse_with_no_name: I fully concur with you, both approaches can be used to develop successful database systems. And neither approach prevents missed requirements, bad design and bad implementation. And I make extensive use of natural keys, virtually every table has at least one natural key. The addition of a surrogate key is a *supplement* to the natural key, not a suitable replacement. (If someone is suggesting that, they are plain wrong.) The question at hand, the two camps, is on the "election" of which candidate key is most suitable as the PRIMARY KEY. – spencer7593 Sep 29 '14 at 06:25
  • @a_horse_with_no_name: I make a distinction between "entity" and "non-entity" tables in the model; the "generation" of values for a surrogate key is a whole other bailiwick; some of the mechanisms available in some RDBMS are less than ideal; but we aren't limited to just those as options. With a requirement that a "generated" key have no gaps, that's not an ideal PK, since it doesn't satisfy the desirable "anonymous" property. (A surrogate key doesn't need to be integer type; it just happens that we can conveniently ensure uniqueness. A GUID is also nearly guaranteed to be unique.) – spencer7593 Sep 29 '14 at 06:37
  • @a_horse_with_no_name: I, too, have had experience with bad design, with "generated" keys as PRIMARY KEY. (I omitted "generated" from the list of desirable properties for a primary key; it just happens that the properties "simple", "unique" and "available" combine, and lend themselves to this notion that unique values can be "generated". The election of a "generated" key as the PRIMARY KEY does not absolve the developer from doing the work required to identify the natural keys, including those in the model, and properly implementing them. (Auto-generated keys are NOT a substitute for design.) – spencer7593 Sep 29 '14 at 06:54
  • @spencer, I don't see the relevance of your story about developers picking the wrong natural key. Presumably the developers could/would have made the same error and the users would have experienced the same problem even if the table had a surrogate key as well. – nvogel Jan 18 '15 at 10:52
  • @sqlvogel: I apologize that I didn't make the relevance more clear. Yes, you are right, the users would have encountered the same limitations. When the application is changed to accommodate the revised requirements, there may be a significant difference in the changes required, due the *foreign keys* referencing the tables. If the table has a surrogate primary key, the *foreign key* references won't need to be changed. With the natural key as a primary key, when the primary key is changed, the foreign key references will need to be changed, which results in more schema, data and code changes – spencer7593 Jan 18 '15 at 17:03
  • So your example is actually about foreign keys, not primary keys? A preferred identifier may be designated as primary key but that doesn't mean it must be used for referential integrity purposes, especially if there is an alternative key more suitable for use as a foreign key reference. I certainly don't agree that a database which invariably uses a surrogate for every foreign key reference makes systems easier to maintain. Surrogates add extra complexity in lots of ways. Horse's suggestion is the prudent approach: choose appropriate keys based on an evaluation of each particular situation. – nvogel Jan 19 '15 at 09:40
0

Of course you can.

create table sometbl(
`name` varchar(250) NOT NULL PRIMARY KEY,
`friendly_name` varchar(400)
);

Time for accessing integer or varchar (unless its too long) key doesn't have any difference. Even if it has, it wont be your main bottleneck. As long as a column is declared as key mysql can access it very fast.

Auto incrementing integer can not be primary key. Its just a serial number for the row. When you look at the real object you'll see it doesn't have any serial number. So the primary key should be based on those real properties.

Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187
  • I know I *can*, I was wondering if this is best practice with a natural ID and if there's any downsides. :) – John Dorean Sep 28 '14 at 19:29
  • 1
    `id` should never be PK. PK should be derived from real entity. – Shiplu Mokaddim Sep 28 '14 at 19:34
  • Oh dear, this boring debate will run and run – Strawberry Sep 28 '14 at 20:02
  • @ChrisWhite: It's entirely possible to build database systems using natural keys as primary keys. But to answer your question, yes, there are some drawbacks to using a natural key. We often find that a natural key does not have the desirable properties of being "anonymous" and "immutable", this isn't a problem during development, but can become a problem later, when changes to the system need to be made, and when the database system has to be integrated with other systems. – spencer7593 Sep 28 '14 at 21:30