Questions tagged [surrogate-key]

A key in a database table, that has no intrinsic logical meaning, and was introduced for better physical organization of the database or other technical reasons.

Terminology

  • A "superkey" is any set of attributes that, when taken together, uniquely identify rows in the table.
  • A minimal1 superkey is called "candidate key", or just "key".
  • A "natural key" is a key that is comprised from attributes that have intrinsic meaning and would be present in the data model even if there was no natural key on top of them.
  • A "surrogate key", on the other hand, has no intrinsic meaning and is introduced purely for technical reasons, as discussed below.

Properties of Surrogate Keys

If there is a natural key in the model, it cannot be replaced with a surrogate key. A surrogate key can only be created in addition to any existing natural keys.

So the engineering decision is not on surrogate versus natural key, but on surrogate + natural key versus the natural key alone.

Having a surrogate key + natural key:

  • May make the FOREIGN KEYs in child tables slimmer.2
  • The surrogate never needs to change, and therefore never incurs ON UPDATE CASCADE referential action.
  • May be more friendly to object-relational mapping (ORM) tools.

Having only a natural key:

  • Makes the parent table slimmer.3
  • May play better with clustering.4
  • May make JOINs unnecessary in some situations.5
  • May be needed for correctly modeling certain kinds of diamond-shaped dependencies. For example, the following model guarantees that if B and C are connected to the same D, they must also be connected to the same A:

    enter image description here

    Note how A_ID gets propagated from the "diamond" top, down both sides and then is merged at the bottom.

Typical Implementations of Surrogate Keys

Most commonly, a surrogate key is implemented as an auto-incremented integer. Examples:

  • Oracle supports the SEQUENCE object, that can be used either directly in INSERT statement, or through an ON INSERT trigger.
  • MS SQL Server has the IDENTITY data type, and from the 2012 version, the explicit SEQUENCE object as well.
  • PostgreSQL supports the explicit SEQUENCE object, as well as the serial types that use sequences implicitly.
  • MySQL has the AUTO_INCREMENT attribute.

GUIDs or UUIDs are sometimes used when uniqueness must be guaranteed without a central "generator" for surrogate key values, such as in certain "disconnected" or replication scenarios.


1 That is, a superkey that would stop being unique (and therefore, being a superkey) if any of the attributes were removed from it.

2 Surrogates tend to use "slimmer" data types such as integers, versus "fatter" types such as strings that are often used in natural keys. Also, while it is not unusual for a natural key to be composite, there is almost never a reason to make a composite surrogate key. As a consequence, a FOREIGN KEY referencing surrogate key tends to be slimmer than a FK referencing natural key.

3 There is no need for the additional index "underneath" the surrogate key. Each new index incurs a maintenance cost for INSERT/UPDATE/DELETE operations and may be especially costly in clustered tables, where secondary indexes must typically contain the copy of the clustering key (which is often same as primary key) and may incur a double-lookup during querying.

4 It is often necessary to query the data on a range that is "aligned" with the natural key. Using the natural key as a clustering key physically orders the data in the table, which can dramatically diminish the I/O under some circumstances. On the other hand, surrogate keys typically do not poses an ordering that would be useful for clustering.

5 We can fetch the migrated natural key directly from the child table, without JOINing with the parent.

110 questions
2
votes
5 answers

How to generate a real unique char only string in java

Is there a way to generate a unique surrogate string like UUID.randomUUID() but containing characters only (means no numbers)? The string is stored in multiple databases on different hosts and has to be system wide unique (even if I generate two…
KIC
  • 5,887
  • 7
  • 58
  • 98
1
vote
3 answers

Is there a function to easily create a surrogate key in Snowflake?

Is there an easy/efficient way to create surrogate keys in Snowflake? Imagine this data set is going to be selected into in a table, during the insertion a battery_id column is added, which is the battery_uuid column mapped to a surrogate key. In…
1
vote
1 answer

DBT 'dbt snapshot' command resulting in error: "Database Error in snapshot snapshot_name Unrecognized name: id at [53:13]"

As the question says, I am running the dbt snapshot command and a few of my snapshots are not working because DBT is not recognizing the surrogate key id that I created. My snapshots are all built the same way and so are the base views that they are…
Gus
  • 63
  • 1
  • 3
  • 5
1
vote
1 answer

How to get the correct table as a Fact Table with relevant keys?(Star Schema)

I have a problem to select the suitable table for the fact table. I have problem with following two tables OrderData Table…
1
vote
1 answer

Bigquery - create surrogate keys on migrated data

We are doing a migration from AWS Redshift to GCP BigQuery. Problem statement: We have a Redshift table that uses the IDENTITY column functionality to issue an internal EDW surrogate key (PK) for natural/business keys. These natural keys are…
beatbox
  • 21
  • 1
  • 5
1
vote
3 answers

Guaranteeing Unique Surrogate Key Assignment - Maximum Matching for Non-Bipartite Graph

I am maintaining a data warehouse with multiple sources of data about a class of entities that have to be merged. Each source has a natural key, and what is supposed to happen is that one and only one surrogate key is created for each natural key…
jennykwan
  • 2,631
  • 1
  • 22
  • 33
1
vote
2 answers

Many-to-many relationship with surrogate key in Entity Framework

Entity Framework magically interprets the following table structure as a many-to-many relationship. table foo (int id) table foo_bar (int foo_id, int bar_id) table bar (int id) But if the join table has any additional fields it will instead be…
Erik Öjebo
  • 10,821
  • 4
  • 54
  • 75
1
vote
1 answer

Retrofit surrogate key in table with natural key in MySql?

Assume a table that uses a natural key and has a number of existing rows. What would be the easiest way to retrofit a surrogate key column and populate it with unique values using MySql? I.e. transform table employees ( social_security_no…
Erik Öjebo
  • 10,821
  • 4
  • 54
  • 75
1
vote
1 answer

What are the type of data problems that relational databases have when they are designed using natural keys (and not surrogate keys)?

I saw this comment: [applications] with the most data-related problems were those using natural keys. Source: Surrogate vs. natural/business keys I want more supporting evidence of this, as the comment left much to imagination. It suggests that…
1
vote
1 answer

Using single column table

I'm creating a database to store the events of mobile apps recovered from multiple sources. Problem is that rows from the event table don't have much meaning to the user as it's mostly a succession of integers. Forcing them to make multiple joins or…
azekirel555
  • 577
  • 2
  • 8
  • 25
1
vote
1 answer

Dimension with a surrogate key into itself (Data Warehouse)

I have an Employee dimension that I am using SCDs and Surrogate keys to track changes over time. Employee's business system key: EmployeeID Employee Surrogate key: EmployeeSCDKey I would like to have Manager information tracked over time as well.…
1
vote
1 answer

Handling Database History for Banking Accounts

I am creating a banking database and I have the following problem here. Suppose CustomerA has an Account Number 4444 and through this I access all details related to the the customer I have a separate primary key but I generally query on this…
matta118
  • 17
  • 4
1
vote
1 answer

Can I use an assigned, natural-key identifier while still allowing NHibernate to identify transient instances?

Object A has a one-to-many association: many object B. When I look in the database -- TableB -- I'd like to see the unique, readable string A.Name instead of having always to join or subselect on a surrogate integer identifier to see the name. I can…
Jay
  • 56,361
  • 10
  • 99
  • 123
1
vote
1 answer

How should I define a composite foreign key for domain constraints in the presence of surrogate keys?

I am writing a new app with Rails so I have an id column on every table. What is the best practice for enforcing domain constraints using foreign keys? I'll outline my thoughts and frustration. Here's what I would imagine as "The Rails Way". It's…
1
vote
1 answer

Security of SQL sequences/auto-increment integer values

A colleague once mentioned to me that when developing web applications, explicitly using a sequence/auto-increment integer (typically a primary key) to uniquely identify values within a database is a security risk, and that since such keys are often…
magnus
  • 4,031
  • 7
  • 26
  • 48