Best Practices for Primary Keys

Question

I am aware of the problem of using Identity columns as primary keys. I think I first read about it on Brent Ozar's blog. Basically, the most often accessed data is the most recently entered data, and with incrementing identity columns, this tends to focus all the database activity on the last page of the database, resulting in lots of contention and locks. It is recommended that your primary key be based on some meaningful column, (or columns), so that data is spread more evenly through the pages of the database.

Given that, here is the primary key I am trying to optimise:

Request_ID int --1234, 2326, etc
Department nchar(2) -- 'MP', 'SS', 'FR', etc
Condition_ID int identity

Condition_ID is necessary because (Request_ID + Department) is not unique. There can be multiple conditions within (Request_ID + Department).

Condition_ID is unique (being an identity col), but making it the primary key alone leads to the problem I mentioned above. Request_ID, while not an identity col is still a sequential number also leading to the same problem.

Department is at least varied (FR, SS, MP, CS, etc)

So, the question I have is this: Given what I have to work with, what would be a good primary key combination that avoids the problem I mentioned?

Would there be any benefit of making the Department the first column in the PK, followed by Request_ID, and then Condition_ID?

Why do you think this is/this could be a problem for your table ? Did you tested and you discovered [high] contention on the last page ? — Bogdan Sahlean, Aug 01 '15 at 19:08
What rate of inserts are you expecting? SQL Server generally takes row level locks so there is no problem except at very high levels of insert activity where you may experience latch contention. See also [Kimberley Tripp](http://www.sqlskills.com/blogs/kimberly/the-clustered-index-debate-continues/) and [Thomas Kejser](http://kejser.org/clustered-indexes-vs-heaps/) — Martin Smith, Aug 01 '15 at 19:08
On the flip side, an incremental key minimizes disk IO, which is especially important with large tables on spinning media. Solid state storage can reduce latency to the point where page latch contention rather than I/O is the bottleneck and this is where the natural key can provide better performance. Note that a fairly heavy load is needed to cause noticeable contention. — Dan Guzman, Aug 01 '15 at 19:15
You may want to see this SO question: http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid — nullforce, Aug 02 '15 at 01:43
No, an "incremental key" by which you probably mean "auto-incremented key" or "surrogate key" is not an "aggregate key" by which you probably mean "natural key" or maybe "composite natural key". Find out what these terms mean. Use technical terms with their technical meanings, not words that happen to have everyday meanings similar to the everyday meaning of a technical term. — philipxy, Aug 02 '15 at 02:25

score 0 · Answer 1 · edited Aug 02 '15 at 01:18

In Oracle, you can reduce the contention by hash partitioning the index. A hash size of, say, 64 offers a nice middle-ground between (A) all new rows in the same blocks (too much contention) and (B) new rows all over the place (too much physical I/O).

You could probably do something similar by using two incremental numbers. That is, computing mod(sequence1,64) and prepending it to sequence2, either by using two columns for the PK or through multiplication --

mod(sequence1,64)*100000000000+sequence2

Also, unless your table has a lot of inserts going on, this contention, while real, may not be an issue you have to worry about.

score 0 · Accepted Answer · answered Aug 02 '15 at 01:42

so that data is spread more evenly through the pages of the database

So you recognize the primary key's properties that will affect performance.

what would be a good primary key combination...?

Based on your data size the combination of keys you mention, did you see any performance differences? If so, how significant? Ultimately you have to find the right combination based on your apps needs of size, speed, load, and other characteristics that Bogdan Sahlean, Martin Smith, and Dan Guzman have raised in the comments above.

Would there be any benefit of making the Department the first column in the PK, followed by Request_ID, and then Condition_ID?

Depends on the queries and size of data. PK choice will affect how the data is also stored obviating the need to make separate indexes because you get a clustered index (for free) based on the PK.

Is an incremental key the same thing as an aggregate key?

No. If you are speaking of SQLSERVER, incremental refers to the auto-increment value that the server manages for that field when adding, removing records. Aggregate key refers to combination of column values that form an aggregate key for that row (often for uniqueness). In your example, all these aggregates are unique (therefore candidates for PK/clustering):

Condition_ID

Condition_ID + Department

Condition_ID + Department + Request_ID

This database is relatively small. No, in fact it is just small. However the server that it is on is substantially busy. All I am really trying to do is learn, and apply best practices, even if the performance is not going to be an issue. — KWallace, Aug 03 '15 at 00:07

Best Practices for Primary Keys

2 Answers2