Questions tagged [surrogate-key]

A key in a database table, that has no intrinsic logical meaning, and was introduced for better physical organization of the database or other technical reasons.

Terminology

  • A "superkey" is any set of attributes that, when taken together, uniquely identify rows in the table.
  • A minimal1 superkey is called "candidate key", or just "key".
  • A "natural key" is a key that is comprised from attributes that have intrinsic meaning and would be present in the data model even if there was no natural key on top of them.
  • A "surrogate key", on the other hand, has no intrinsic meaning and is introduced purely for technical reasons, as discussed below.

Properties of Surrogate Keys

If there is a natural key in the model, it cannot be replaced with a surrogate key. A surrogate key can only be created in addition to any existing natural keys.

So the engineering decision is not on surrogate versus natural key, but on surrogate + natural key versus the natural key alone.

Having a surrogate key + natural key:

  • May make the FOREIGN KEYs in child tables slimmer.2
  • The surrogate never needs to change, and therefore never incurs ON UPDATE CASCADE referential action.
  • May be more friendly to object-relational mapping (ORM) tools.

Having only a natural key:

  • Makes the parent table slimmer.3
  • May play better with clustering.4
  • May make JOINs unnecessary in some situations.5
  • May be needed for correctly modeling certain kinds of diamond-shaped dependencies. For example, the following model guarantees that if B and C are connected to the same D, they must also be connected to the same A:

    enter image description here

    Note how A_ID gets propagated from the "diamond" top, down both sides and then is merged at the bottom.

Typical Implementations of Surrogate Keys

Most commonly, a surrogate key is implemented as an auto-incremented integer. Examples:

  • Oracle supports the SEQUENCE object, that can be used either directly in INSERT statement, or through an ON INSERT trigger.
  • MS SQL Server has the IDENTITY data type, and from the 2012 version, the explicit SEQUENCE object as well.
  • PostgreSQL supports the explicit SEQUENCE object, as well as the serial types that use sequences implicitly.
  • MySQL has the AUTO_INCREMENT attribute.

GUIDs or UUIDs are sometimes used when uniqueness must be guaranteed without a central "generator" for surrogate key values, such as in certain "disconnected" or replication scenarios.


1 That is, a superkey that would stop being unique (and therefore, being a superkey) if any of the attributes were removed from it.

2 Surrogates tend to use "slimmer" data types such as integers, versus "fatter" types such as strings that are often used in natural keys. Also, while it is not unusual for a natural key to be composite, there is almost never a reason to make a composite surrogate key. As a consequence, a FOREIGN KEY referencing surrogate key tends to be slimmer than a FK referencing natural key.

3 There is no need for the additional index "underneath" the surrogate key. Each new index incurs a maintenance cost for INSERT/UPDATE/DELETE operations and may be especially costly in clustered tables, where secondary indexes must typically contain the copy of the clustering key (which is often same as primary key) and may incur a double-lookup during querying.

4 It is often necessary to query the data on a range that is "aligned" with the natural key. Using the natural key as a clustering key physically orders the data in the table, which can dramatically diminish the I/O under some circumstances. On the other hand, surrogate keys typically do not poses an ordering that would be useful for clustering.

5 We can fetch the migrated natural key directly from the child table, without JOINing with the parent.

110 questions
3
votes
2 answers

Are Varnish Hashtwo/Xkey and Fastly's "Surrogate Keys" the same?

I'm currently deciding whether to manage my own Varnish servers or use a hosted service like Fastly. One of the most important decision factors here is efficient tag-based cache invalidation, since I plan to put Varnish in front of our API and we'll…
jdotjdot
  • 16,134
  • 13
  • 66
  • 118
3
votes
2 answers

join between two many-to-many tables

I have some trouble defining appropriate database design for my java based web application. I have 3 tables(Tag, DT and Property) and my scenario is as follow. Each Tag can contain many Dts and each DT can be assigned to different Tags, each DT has…
gabi
  • 1,324
  • 4
  • 22
  • 47
3
votes
3 answers

Surrogate key 'preference' explanation

As I understand there is a war going on between purists of natural key and purists of surrogate key. In likes to this this post (there are more) people say 'natural key is bad for you, always use surrogate... However, either I am stupid or blind but…
RandomWhiteTrash
  • 3,974
  • 5
  • 29
  • 42
3
votes
3 answers

Natural vs surrogate keys on support tables

I have read many articles about the battle between natural versus surrogate primary keys. I agree in the use of surrogate keys to identify records of tables whose contents are created by the user. But in the case of supporting tables what should I…
Marco Staffoli
  • 2,475
  • 2
  • 27
  • 29
2
votes
1 answer

Why would one consider using Surrogate keys vs Natural with ON UPDATE CASCADE?

Disclaimer: This is not the same question as the other topics. One of the cons we face when using Natural Keys, is that, if the business logic changes and we need to change one key, we need to propagate this change throughout all linked…
PedroD
  • 5,670
  • 12
  • 46
  • 84
2
votes
1 answer

Surrogate Key Mapping for large (50 Million) keysets in Apache Flink

I have a use case where the apache flink process must integrate near real-time data streams (events) from multiple sources but due to lack of uniform keys in the different systems I need to use a surrogate key (SK) lookup from an existing data base.…
newTricks
  • 21
  • 2
2
votes
1 answer

Linq2SQL database design: mapping composite/surrogate keys

Image that I have a huge database which stores threads and posts from different datasources like two different forums. For each datasource, the Id of the entity (e.g. ThreadId, PostId...) is unique but it may collide with the Id of an entity…
Shackles
  • 1,264
  • 1
  • 19
  • 40
2
votes
2 answers

How surrogate keys are handles in hive

I know that hive cannot create surrogate keys or is rather difficult. I want to understand how companies have implemented dimensional modeling in their warehouse. One way I can think of is leaving the dimension details as is in fact. Then move the…
Ravi
  • 1,811
  • 1
  • 18
  • 31
2
votes
3 answers

Lookup Table -- Natural or Surrogate key as primary key?

I have a table for recording license usage. Each license usage needs to be associated with a user and a host machine. The table definition looks like this. create table if not exists per_user_fact ( per_user_fact_id int unsigned …
Pankaj Dwivedi
  • 379
  • 1
  • 6
  • 16
2
votes
1 answer

Surrogate Keys complicate insertion?

I often see people using artifical / surrogate keys in relational databases. Thinking about it, it seems to me that while this simplifies join-Queries, it complicates the insertion of new tuples. Take the following example: R1(a, b, c) R2(c, d, e) c…
2
votes
2 answers

Mixed surrogate composite key insert in JPA 2.0, PostgreSQL and Hibernate 3.5

First off, we are using JPA 2.0 and Hibernate 3.5 as persistence provider on a PostgreSQL database. We successfully use the sequence of the database via the JPA 2.0 annotations as an auto-generated value for single-field-surrogate-keys and all works…
Gerald
  • 41
  • 6
2
votes
2 answers

Separate table in case of surrogate key?

I have been reading the heated debates on composite vs surrogate keys on Stack Overflow and other websites, and even though puzzled about some of the arguments given, I feel I am aware of the pros and cons of each. I feel tempted to go for surrogate…
littlegreen
  • 7,290
  • 9
  • 45
  • 51
2
votes
5 answers

Thoughts on using email addresses as primary key

What are practices about using email addresses as the primary key? Should I avoid it and use an auto incremented ID number instead or is the engine able to handle it just as well? MySQL database but i'm interested in how other engines might handle…
d00dle
  • 1,276
  • 1
  • 20
  • 33
2
votes
3 answers

Surrogate key as a foreign key over composite keys

I realise there might be similar questions but I couldn't find one that was close enough for guidance. Given this spec, Site --------------------------- SiteID int identity Name varchar(50) Series --------------------- SiteID …
J F
  • 631
  • 7
  • 15
2
votes
2 answers

How to create a fact table using natural keys

We've got a data warehouse design with four dimension tables and one fact table: dimUser id, email, firstName, lastName dimAddress id, city dimLanguage id, language dimDate id, startDate, endDate factStatistic id, dimUserId, dimAddressId,…
s.froehlich
  • 863
  • 1
  • 11
  • 19