0

I am rebuilding a big warehouse database which originally used natural keys, and now I want to switch to surrogate keys.

Therefore, I am considering to split the database into a physical layer and a logical layer.

In the physical layer, every table gets basically two fields like this: (simplified case)

tblProducts:

ProductKey bigint identity (1,1) not null primary key,
ProductID nvarchar(100) not null unique

tblSales:

RowKey bigint identity (1,1) not null primary key,
ProductKey bigint not null references tblProducts(ProductKey),
DateSold date,
UnitsSold decimal,
UNIQUE (ProductKey, DateSold)
  • The "Key" field is always the primary key which is also used in all foreign key relationships.
  • The "ID" field is a unique nvarchar. The users will always work with the "ID" and never see the "key".

In the logical layer, the idea is to create views for every table which will "hide" the surrogate key and will allow me to work with the IDs. Even the database admins should finally work only with the views! For example:

View vwSales:

select 
    prd.ProductID,
    sls.DateSold,
    sls.UnitsSold
from tblSales sls inner join tblProducts prd on sls.ProductKey = prd.ProductKey

Now if I want to work with this view, I would need to create triggers which are handling all update/delete/insert and "translate" it to the physical layer. To create all these views is a lot of work. So:

  • Is there a way to "automatically" create such views?
  • It is a good approach to split the database into physical and logical layer at all?
askolotl
  • 964
  • 1
  • 13
  • 27
  • 1
    This sounds like a complicated solution looking for a problem. A datawarehouse with a _star schema_ should be simple enough that extracting ID's uses just simple obvious joins. – Nick.Mc Jan 23 '20 at 14:44
  • @Nick.McDermaid You mean we should stay with natural keys ? – askolotl Jan 23 '20 at 17:57
  • 1
    Sorry I was referring to the logical view part being complicated. Using surrogate keys instead of (or as well as) natural keys is a sound design decision. – Nick.Mc Jan 23 '20 at 23:26

1 Answers1

1

While there is a way to automate the creation of DDL, I can tell you from experience this is not the way you want to go.

Yes, creating surrogate keys is best practice in a Data Warehouse (which we do). And yes, you can hide the surrogate keys from views or dimensional layers (which we also do), the automatic creation of views which are essentially the same as the underlying data (except for the surrogate key) unnecessarily burdens you with complexity that I believe you will eventually regret.

Einstein famously said "Everything should be made as simple as possible, but no simpler."

The vendor that originally built our data warehouse put in a process that creates DDL that creates a view for every physical table. Well intentioned, these views provided little value, became a maintenance problem and unnecessarily cluttered up a very well organized DW. We eventually removed all these views.

You can still get the benefits of having a different logical layer. I would recommend using views to construct your Dimensional Layer. We only publicize the "logical" dimensional layer views to business users who wish to consume DW data.

I highly recommend you research the benefits of ELT over the traditional ETL processes. It has revolutionized our DW approach, and achieved the benefits you may be seeking in a logical layer.

TK Bruin
  • 472
  • 4
  • 15
  • 1
    Thank you so much, this is a very constructive answer! I've meanwhile also came away from the idea of creating a full "logical layer" for everything. We keep surrogate keys and hide them away, but will handle everything with procedures. And I will research ELT vs ETL. – askolotl Feb 13 '20 at 09:42
  • PS: In my 25 years experience with warehouses, I created databases with and without surrogate keys. Both have their pros and cons. Biggest "con" is that you cannot "humanly" read any data, and for every action, you need to construct complex queries. Biggest "pro" is total flexibility with renaming and you can create a straight logical layer which relies on existence of one single key in every table. – askolotl Feb 13 '20 at 09:46
  • @TK Bruin great answer. Do you have/recommend any good resources, regarding your latest comment on ELT vs. ETL? I'm currently struggling to find a 'modern' approach on the whole ETL pratice. – TJ_ Jun 26 '20 at 13:23