Questions tagged [surrogate-key]

A key in a database table, that has no intrinsic logical meaning, and was introduced for better physical organization of the database or other technical reasons.

Terminology

  • A "superkey" is any set of attributes that, when taken together, uniquely identify rows in the table.
  • A minimal1 superkey is called "candidate key", or just "key".
  • A "natural key" is a key that is comprised from attributes that have intrinsic meaning and would be present in the data model even if there was no natural key on top of them.
  • A "surrogate key", on the other hand, has no intrinsic meaning and is introduced purely for technical reasons, as discussed below.

Properties of Surrogate Keys

If there is a natural key in the model, it cannot be replaced with a surrogate key. A surrogate key can only be created in addition to any existing natural keys.

So the engineering decision is not on surrogate versus natural key, but on surrogate + natural key versus the natural key alone.

Having a surrogate key + natural key:

  • May make the FOREIGN KEYs in child tables slimmer.2
  • The surrogate never needs to change, and therefore never incurs ON UPDATE CASCADE referential action.
  • May be more friendly to object-relational mapping (ORM) tools.

Having only a natural key:

  • Makes the parent table slimmer.3
  • May play better with clustering.4
  • May make JOINs unnecessary in some situations.5
  • May be needed for correctly modeling certain kinds of diamond-shaped dependencies. For example, the following model guarantees that if B and C are connected to the same D, they must also be connected to the same A:

    enter image description here

    Note how A_ID gets propagated from the "diamond" top, down both sides and then is merged at the bottom.

Typical Implementations of Surrogate Keys

Most commonly, a surrogate key is implemented as an auto-incremented integer. Examples:

  • Oracle supports the SEQUENCE object, that can be used either directly in INSERT statement, or through an ON INSERT trigger.
  • MS SQL Server has the IDENTITY data type, and from the 2012 version, the explicit SEQUENCE object as well.
  • PostgreSQL supports the explicit SEQUENCE object, as well as the serial types that use sequences implicitly.
  • MySQL has the AUTO_INCREMENT attribute.

GUIDs or UUIDs are sometimes used when uniqueness must be guaranteed without a central "generator" for surrogate key values, such as in certain "disconnected" or replication scenarios.


1 That is, a superkey that would stop being unique (and therefore, being a superkey) if any of the attributes were removed from it.

2 Surrogates tend to use "slimmer" data types such as integers, versus "fatter" types such as strings that are often used in natural keys. Also, while it is not unusual for a natural key to be composite, there is almost never a reason to make a composite surrogate key. As a consequence, a FOREIGN KEY referencing surrogate key tends to be slimmer than a FK referencing natural key.

3 There is no need for the additional index "underneath" the surrogate key. Each new index incurs a maintenance cost for INSERT/UPDATE/DELETE operations and may be especially costly in clustered tables, where secondary indexes must typically contain the copy of the clustering key (which is often same as primary key) and may incur a double-lookup during querying.

4 It is often necessary to query the data on a range that is "aligned" with the natural key. Using the natural key as a clustering key physically orders the data in the table, which can dramatically diminish the I/O under some circumstances. On the other hand, surrogate keys typically do not poses an ordering that would be useful for clustering.

5 We can fetch the migrated natural key directly from the child table, without JOINing with the parent.

110 questions
0
votes
0 answers

Surrogate key part of business object?

I have a database table (mysql) representing the entity Employee in an enterprise like application. Surrogate keys are used throughout the database, so every Employee record has an auto incremented ID column identifying the Employee. I have modelled…
Michbeckable
  • 1,851
  • 1
  • 28
  • 41
0
votes
2 answers

Loading Fact Table + Lookup / UnionAll for SK lookups

I got to populate FactTable with 12 lookups to dimension table to get SK's, of which 6 are to different Dim Tables and rest 6 are lookup to same DimTable (type II) doing lookup to same natural key. Ex: PrimeObjectID => lookup to DimObject.ObjectID…
Sreedhar
  • 29,307
  • 34
  • 118
  • 188
0
votes
3 answers

LEFT OUTER JOIN causing violation of unique key constraint

I have two databases for two companies running on the same software, thus the DB structures are the same. Windows Server 2003, MS SQL Sever 2005. I'm attempting to copy the items list and associated tables from SOURCE to DESTINATION. There are a…
0
votes
1 answer

Appropriate use of surrogate key?

Say a particular country has many sights (monuments, parks, museums), and of course a particular sight can exist in many states. In terms of a logical ERD model, a junction table should be used between the M:N relationship. However, would it suffice…
0
votes
2 answers

Surrogate Keys in a Data Warehouse

I understand the general concept of a surrogate key in a DWH environment. But there are two aspects I don't understand and couldn't find information about: Is it common practice that a surrogate key is unique in the whole DWH or unique in one…
user3364656
  • 279
  • 2
  • 4
  • 9
0
votes
2 answers

How to Page data for Table with GUID key

i have a table similar to below, which has GUID as the key. i am trying to display the content of tis using paging which has GUID as key, but running into issue of how do i do that? CREATE TABLE `planetgeni`.`PostComment` ( `PostCommentId`…
Justin Homes
  • 3,739
  • 9
  • 49
  • 78
0
votes
1 answer

Generate surrogate key in PIG using custom rank

I will be doing PIG transformation daily (new data everyday). And I need to generate Unique key for data pulled everyday. what would be best approach ? If I perform does rank for tomarow will overwrite today rank ?
Srikanth
  • 15
  • 1
  • 3
0
votes
1 answer

DWH import with surrogate keys (and SCD)

I have a Data Warehouse which uses internal surrogate keys and type 2 slowly changing dimensions. In the clearing we just have the business keys from the erp-system, like this: In the Data Warehouse we want to use the surrogate keys instead (Note:…
Josh Alvo
  • 96
  • 1
  • 6
0
votes
2 answers

Look up surrogate keys and Populate fact table

I have 8 surrogate keys that I need to lookup on the dimension tables and populate a fact table. I created 8 lookups to the dimension tables, and then use the merge join and sort components to merge the data into one result set for insertion into…
John W. Mnisi
  • 845
  • 2
  • 11
  • 16
0
votes
0 answers

Surrogate key in a datamart

I have 32 dimensions for three fact tables, and i'm currently working on SQL Server 2012 ! As SQL Server's foreign keys are limited to 16 per index, does it mean that i have to use a surrogate key to index my data in the fact tables ?! Or is there…
0
votes
4 answers

Should a primary key necessarily be auto-incrementing when I'm sure it is and will always be unique?

I've looked for a satisfying answer a tad more specific to my particular problem for a while now, but to avail. Whether I'm just not looking at the right places or not, I don't know, but here goes: I'm pulling data from an application that…
Thevet
  • 154
  • 9
0
votes
1 answer

How to change primary from one attribute to another. mysql

I have a database in which I have declared a primary key. Later on in implementing the database, I realized that I will have to create an auto-incrementing surrogate key and switch my current primary key to that, as my current primary key will…
-1
votes
2 answers

Surrogate keys in star schema hierarchy dimension

Is it necessary to have surrogate keys for each hierarchy level above the lowest level in a dimension table? Row City_Key City_Name State 1 1234 Chicago Illinois 2 3245 Dallas Texas 3 4563 Huston Texas 4 …
-1
votes
1 answer

is it bad to add surrogate key to each table in addiction to the natural primary key?

Is it considered bad to add surrogate key (ID column) to each table in addition to the natural primary key? If it is not is the model still considered relational? I ask because stand alone surrogate keys remove the contextual meaning of the columns.…
-1
votes
1 answer

Is it possible to have a natural and surrogate multicolumn primary key?

I have a temporal table that uses a natural primary key. This table stores a CSV file that was imported, it's a history table to keep track of every import. I starting to see some duplicates (in the CSVs I import) of the primary key field and would…