Questions tagged [surrogate-key]

A key in a database table, that has no intrinsic logical meaning, and was introduced for better physical organization of the database or other technical reasons.

Terminology

  • A "superkey" is any set of attributes that, when taken together, uniquely identify rows in the table.
  • A minimal1 superkey is called "candidate key", or just "key".
  • A "natural key" is a key that is comprised from attributes that have intrinsic meaning and would be present in the data model even if there was no natural key on top of them.
  • A "surrogate key", on the other hand, has no intrinsic meaning and is introduced purely for technical reasons, as discussed below.

Properties of Surrogate Keys

If there is a natural key in the model, it cannot be replaced with a surrogate key. A surrogate key can only be created in addition to any existing natural keys.

So the engineering decision is not on surrogate versus natural key, but on surrogate + natural key versus the natural key alone.

Having a surrogate key + natural key:

  • May make the FOREIGN KEYs in child tables slimmer.2
  • The surrogate never needs to change, and therefore never incurs ON UPDATE CASCADE referential action.
  • May be more friendly to object-relational mapping (ORM) tools.

Having only a natural key:

  • Makes the parent table slimmer.3
  • May play better with clustering.4
  • May make JOINs unnecessary in some situations.5
  • May be needed for correctly modeling certain kinds of diamond-shaped dependencies. For example, the following model guarantees that if B and C are connected to the same D, they must also be connected to the same A:

    enter image description here

    Note how A_ID gets propagated from the "diamond" top, down both sides and then is merged at the bottom.

Typical Implementations of Surrogate Keys

Most commonly, a surrogate key is implemented as an auto-incremented integer. Examples:

  • Oracle supports the SEQUENCE object, that can be used either directly in INSERT statement, or through an ON INSERT trigger.
  • MS SQL Server has the IDENTITY data type, and from the 2012 version, the explicit SEQUENCE object as well.
  • PostgreSQL supports the explicit SEQUENCE object, as well as the serial types that use sequences implicitly.
  • MySQL has the AUTO_INCREMENT attribute.

GUIDs or UUIDs are sometimes used when uniqueness must be guaranteed without a central "generator" for surrogate key values, such as in certain "disconnected" or replication scenarios.


1 That is, a superkey that would stop being unique (and therefore, being a superkey) if any of the attributes were removed from it.

2 Surrogates tend to use "slimmer" data types such as integers, versus "fatter" types such as strings that are often used in natural keys. Also, while it is not unusual for a natural key to be composite, there is almost never a reason to make a composite surrogate key. As a consequence, a FOREIGN KEY referencing surrogate key tends to be slimmer than a FK referencing natural key.

3 There is no need for the additional index "underneath" the surrogate key. Each new index incurs a maintenance cost for INSERT/UPDATE/DELETE operations and may be especially costly in clustered tables, where secondary indexes must typically contain the copy of the clustering key (which is often same as primary key) and may incur a double-lookup during querying.

4 It is often necessary to query the data on a range that is "aligned" with the natural key. Using the natural key as a clustering key physically orders the data in the table, which can dramatically diminish the I/O under some circumstances. On the other hand, surrogate keys typically do not poses an ordering that would be useful for clustering.

5 We can fetch the migrated natural key directly from the child table, without JOINing with the parent.

110 questions
-1
votes
1 answer

is the key we use in phpmyadmin is a primary key or a surrogate key?

What's the difference between a primary key and surrogate key if both are unique keys. The primary key acts same as a surrogate key in phpmyadmin. I have heard in a video that a surrogate key is independent of the other columns of the table. Also…
-1
votes
1 answer

How to replace primary key with surrogate keys during ETL?

Have a question that is haunting me for some time. How in practice looks replacing primary keys with surrogate keys during the ETL process? Like what is the workflow - is it just assigning new IDENTITY? If so, how about previous values, how to…
-1
votes
1 answer

datastage scd stage is updating old records instead of inserting no match business keys

we have dim build with 1.8 million records and scd is updating the old records instead of inserting the record when business key is not found. need immediate help as this is a production issue.... we had identity on the destination table and we are…
-1
votes
3 answers

MD5 hash as artificial keys

I'm seeing lots of applications using hashes as surrogate keys instead of plain integers. I can't see any good reason for this kind of design. Given most UUID implementations are just hashed timestamps, why so many database designers choose them for…
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
-2
votes
1 answer

How to sort relational table without surrogate PK by insertion order?

Say I have a JOB and COMPANY table and a third JOB_COMPANY table that relates the two in a many-to-many cardinality. I've learnt that it's an anti-pattern for the JOB_COMPANY to have a surrogate id PK. The PK should probably be a composite…
ijverig
  • 2,795
  • 3
  • 18
  • 26
1 2 3 4 5 6 7
8