2

There is a table in our environment. Recently, it was discovered that performance was greatly improved by sorting datetime, which the dba wanted to make the primary key. Since he can't guarantee uniqueness with the datetime, he added the id that was once the primary key, into his new composite key.

So there is a table with the primary key as datetime / id and also the clustered index is defined this way. All the pk / fk relationships are still set up properly and exist on the id to id paradigm one would expect.

what could be the possible problem of a lopsided primary key?

And performance is considerably improved with this change.

however, in the schema, the actual "primary key" is two columns. what could possibly go wrong?

discosammy
  • 323
  • 1
  • 11
  • Possible duplicate of [Why use multiple columns as primary keys (composite primary key)](http://stackoverflow.com/questions/2626158/why-use-multiple-columns-as-primary-keys-composite-primary-key) – Whencesoever Aug 09 '16 at 17:56
  • 1
    So that did not break the existing FK relationships. Why did he not just put a non clustered index on the datetime? – paparazzo Aug 09 '16 at 17:59
  • 1
    Keep the existing unique column as the primary key. Then create a new INDEX on the datetime column. You do not mention database platform BUT if this were Sql Server 2008 (*I believe*) or later you can then choose INCLUDE columns which will be persisted with the indexed columns. This should still give you close to the speed of having the date be/be included with the clustered index. If you want more options you should include the database platform and version. – Igor Aug 09 '16 at 18:02
  • Also a Primary key should be chosen based on uniqueness, not based on query execution speed. Assuming the former (schema) is correct the later (queries) can always be tuned using indexes or ensuring the query structure itself is correct. – Igor Aug 09 '16 at 18:04
  • in some cases, sorting the data other than in pk ascending make sense, depending on how the data is being used/accessed. that's the case here. I'm wondering about the dichotomy from having the primary key malformed like this. The proper relationships exist everywhere in the database... except for the schematic definition on the table, which has the datetime included in the definition. – discosammy Aug 09 '16 at 18:35
  • paparazzi there was already a non-clustered index. lots of data. the clustered index had a 120 second impact. – discosammy Aug 09 '16 at 18:37

3 Answers3

2

Do not do that! Set up a unique index with the two fields. It does not have to be primary key. In fact if you want the original key to remain unique, then this is a terrible idea.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • I dont see how adding a date to a unique id would jeopardize uniqueness. – Joe C Aug 09 '16 at 18:07
  • 1
    @Joe: If the old PK was only based on the `id` column. But now the new PK is defined as `id` + `date` columns. The `id` column now allows duplicate values. This is probably not desirable. – sstan Aug 09 '16 at 18:13
  • Ah, without a unique index on ID then ID could be duplicated. I thought it was implied that ID + date would not be unique. I misunderstood. – Joe C Aug 09 '16 at 19:15
1

EDIT: This answer was assuming Sql Server. If it turns out that it is not then I will delete my answer.

You do not list details so I will have to give a very general answer. In my research I have found that most will recommend a short primary key / clustered index.

The real key here though is what you mean by increased performance. Is it just one query? In other words does this change have beneficial or at least insignificant performance impact on all operations of this data? User interfaces, all reports, and so on. Or is this robbing peter to pay paul?

If this were a reporting database or data warehouse where the majority of reports are based on date, I could see why people might recommend having the clustered index setup in such a way that it would benefit all reports, or the most important ones.

In any other situation I can think of having a non-clustered index would provide almost the same level or performance increase without increasing the size of the PK, which is used in all lookups (more bytes read = slower performance) as well as taking up more space on your data pages.

EDIT: This article explains this topic better than I could.

https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/

Joe C
  • 3,925
  • 2
  • 11
  • 31
1

The performance advantage you are currently seeing(if genuine) is due to the clustered index associated with the primary key, and not the primary key itself. If you are happy with the current index, but are concerned about uniqueness you should keep the unique datetime / id as your clustered index but revert to your old unique id as the primary key.

This also addresses the problem where other tables referencing this primary key would have required the creation of a likely inappropriate datetime column to create a foreign key relationship.

Derrick Moeller
  • 4,808
  • 2
  • 22
  • 48
  • exactly... the clustered index part i'm not questioning. that's clearly beneficial in this instance. I don't know why he didn't create the clustered index and leave the primary key alone. He sees a performance increase. it's measurable by what he did. the joins are there because the all the ids are the same. so are the fk constraints (just pointing to the id). the only difference is the date column being included in the table definition. – discosammy Aug 09 '16 at 20:22