Pros & Cons of Date Column as Part of Primary Key

Question

I am currently working on a database, where a log is required to track a bunch of different changes of data. Stuff like price changes, project status changes, etc. To accomplish this I've made different 'log' tables that will be storing the data needing to be kept.

To give a solid example, in order to track the changing prices for parts which need to be ordered, I've created a Table called Part_Price_Log. The primary key is composite made up of the date in which the part price is being modified, and a foreign key to the Part's unique ID on the Parts Table.

My logic here, is that if you need to look up the current price for a part, you just need to find the most recent entry for that Part ID. However, I am being told not to implement it this way because using Date as part of a primary key is an easy way to get errors in your data.

So my question is thus.

What are the pros/cons of using a Date column as part of a composite primary key? What are some better alternatives?

Datetime is stored in its binary representation, much the same as an integer. Hence, I don't quite see what could be wrong is using that as part of a primary key. As per its use (get current price), since (I guess) these prices are not changing many times a day, for any product you would be having, at most, few tens of records/year. If my assumption is correct, I would create a non-unique index with the product ID and fetch the `MAX(Registration_Date)` record for that product. Just a thought for simplification purposes... — FDavidov, Apr 26 '18 at 13:11
@GordonLinoff: assume it is the 'Parts prices' database because it probably doesn't make a difference. — onedaywhen, Apr 26 '18 at 13:18

score 6 · Answer 1 · answered Apr 26 '18 at 12:34

6

In general, I think the best primary keys are synthetic auto-incremented keys. These have certain advantages:

The key value records the insertion order.
The keys are fixed length (typically 4 bytes).
Single keys are much simpler for foreign key references.
In databases (such as SQL Server by default) that cluster the data based on the primary key, inserts go "at the end".
They are relatively easy to type and compare (my eyes just don't work well for comparing UUIDs).

The fourth of these is a really big concern in a database that has lots of inserts, as suggested by your data.

There is nothing a priori wrong with composite primary keys. They are sometimes useful. But that is not a direction I would go in.

answered Apr 26 '18 at 12:34

Gordon Linoff

1,242,037
58
646
786

So you're saying instead of using a Composite Key, to instead just create a new auto-increment ID column and use that as the PK instead? – Skitzafreak Apr 26 '18 at 12:37
@Skitzafreak . . . Yes. Most tables I create have identity columns as primary keys. – Gordon Linoff Apr 26 '18 at 12:41
1

And what about putting a candidate key on the price + interval 'natural' key? – onedaywhen Apr 26 '18 at 13:39

score 2 · Answer 2 · answered Apr 26 '18 at 13:18

2

I agree that it is better to keep the identity column/uniqueidentifier as primary key in this scenario, Also if you make partid and date as composite primary key, it is going to fail in a case when two concurrent users try to update the part price at same time.The primary key is going to fail in that case.So the better approach will be to have an identity column as primary key and keep on dumping the changes in log table.In case you hit some performance barriers later on you can partition your table year wise and can overcome that performance challenge.

answered Apr 26 '18 at 13:18

ErGaurav

81
2

"if you make partid and date as composite primary key, it is going to fail in a case when two concurrent users try to update the part price at same time" - I think this a a weak argument and your proposed solution appears to be a race condition or real world duplicates (different price for overlapping interval)! – onedaywhen Apr 26 '18 at 13:37
I have put that argument because our system should be robust enough to handle every scenario.Its really possible to have the same values been inserted at same time, I already have encountered that problem in one of my project. – ErGaurav Apr 27 '18 at 01:55

EzLo · Answer 3 · 2018-04-26T14:27:39.407

Pros and cons will vary depending on the performance requirements and how often you will query this table.

As a first example think about the following:

CREATE TABLE Part_Price_Log (
    ModifiedDate DATE,
    PartID INT,
    PRIMARY KEY (ModifiedDate, PartID))

If the ModifiedDate is first and this is an logging table with insert-only rows, then every new row will be placed at the end, which is good (reduces fragmentation). This approach is also good when you want to filter directly by ModifiedDate, or by ModifiedDate + PartID, as ModifiedDate is the first column in the primary key. A con here would be searching by PartID, as the clustered index of the primary key won't be able to seek directly the PartID.

A second example would be the same but inverted primary key ordering:

CREATE TABLE Part_Price_Log (
    ModifiedDate DATE,
    PartID INT,
    PRIMARY KEY (PartID, ModifiedDate))

This is good for queries by PartID, but not much for queries directly by ModifiedDate. Also having PartID first would make inserts displace data pages as inserted PartIDis lower than the max PartID (which increases fragmentation).

The last example would be using a surrogate primary key like an IDENTITY.

CREATE TABLE Part_Price_Log (
    LogID BIGINT IDENTITY PRIMARY KEY,
    ModifiedDate DATE,
    PartID INT)

This will make all inserts go last and reduce fragmentation but you will need an additional index to query your data, such as:

CREATE NONCLUSTERED INDEX NCI_Part_Price_Log_Date_PartID ON Part_Price_Log (ModifiedDate, PartID)
CREATE NONCLUSTERED INDEX NCI_Part_Price_Log_PartID_Date ON Part_Price_Log (PartID, ModifiedDate)

The con about this last one is that insert operations will take longer (as the index also has to be updated) and the size of the table will increase due to indexes.

Also keep in mind that if your data allows for multiple updates of the same part for the same day, then using compound PRIMARY KEY would make the 2nd update fail. Your choices here are to use a surrogate key, use a DATETIME instead of DATE (will give you more margin for updates), or use a CLUSTERED INDEX with no PRIMARY KEY or UNIQUE constraint.

I would suggest doing the following. You only keep one index (the actual table, as it is clustered), the order is always insert, you don't need to worry about repeated ModifiedDate with same PartID and your queries by date will be fast.

CREATE TABLE Part_Price_Log (
    LogID INT IDENTITY PRIMARY KEY NONCLUSTERED,
    ModifiedDate DATE,
    PartID INT)

CREATE CLUSTERED INDEX NCI_Part_Price_Log_Date_PartID ON Part_Price_Log (ModifiedDate, PartID)

score 1 · Answer 4 · answered Apr 26 '18 at 13:36

Without knowing your domain, it's really hard to advise. How do your identify a part in the real world? Let's assume you use EAN. This is your 'natural key'. Now, does a part get a new EAN each time the price changes? Probably not, in which case the real world identifier for a part price is a composite of its EAN and the period of time during which that price was effective.

I think the comment about "an easy way to get errors in your data" is referring to the fact the tempoal databases are not only more complex by nature (they have a additional dimension - time), the support for temporal functionality is lacking in most SQL DBMSs.

For example, does your SQL product of choice have an interval data type, or do you need to roll your own using a pair of start_date and end_date columns? Does your SQL product of choice have the capability to intra-table constraints e.g. to prevent overlapping or non-concurrent intervals for the same part? Does your SQL product have temporal functions to query temporal data easily?

Pros & Cons of Date Column as Part of Primary Key

4 Answers4

Linked