0

I am trying really hard to understand the concept of having natural keys in date Dimension table.

I have always seen a random surrogate keys being created in the dimension tables. But I have recently read that using a natural key in date dimension something like 20150806 for Aug-06-2015 works much better and has considerable performance gain when it comes to lookup and reverse lookup from fact tables when compared with a natural surrogate key which is int by nature.

I can't understand how it will give any performance gain. We still would require join between facts and dimension even if we use this fancy key for date dimension.

If anyone has any insight on this, would you please mind sharing the knowledge. I would appreciate if you could follow up with an example.

Felix Pamittan
  • 31,544
  • 7
  • 41
  • 67
imba22
  • 651
  • 1
  • 13
  • 25
  • It makes it easy to partition a big fact table by date. This would increase performance. Also can avoid surrogate key value lookups during ETL – Neil McGuigan Aug 10 '15 at 22:09

1 Answers1

0

There's no performance gain -- the key is just an arbitrary identifier for the date record. In business apps, I generally use the day offset from January 1 2000, as a smallint. There are slight advantages to using a smaller integer, because the date key is present in so many records, it helps reduce the record size (therefore increasing the number of cacheable records).

The only real advantage to using a "natural" key value for a date is that the table can be use or browsed without necessarily joining to the date dimension table, and has some visible "lexical" meaning.

But it doesn't help in the context of a data warehouse, at all.

Curt
  • 5,518
  • 1
  • 21
  • 35
  • The second point which you made is precisely which I cannot wrap my head around. I mean sure if there is an entry in Fact with date foreign key as 20150806, I would know the date without going back to date dimension, I totally get that. But what if I am interested in something like which week was it. I will have to go back to date dimension if I wish to slice and dice more. You get what I mean?? – imba22 Aug 07 '15 at 00:38
  • In Curt's reply, he said "...without NECESSARILY joining to the date dimension..." In other words, if you only filter by a single date or a date range. If you filter or sort or aggregate by week, then it will need to join. Even if you filter by a single date, it's not necessarily guaranteed that the query plan won't join. – NaturalData Aug 08 '15 at 16:53