I agree with the comment above - the 'source system' key is a composite key from the source system + another string or int identifying the actual source system. This is seperate to the surrogate key mentioned in the previous answers. You really have two keys in your dimension. One is the standard IDENTITY
surrogate key - no suprises there.
The other is a composite key consisting of the key from the source system and an identifier (I actually usually just use a string) that tells you which system it comes from.
So your dimension looks like this:
Customer_SK SRC_Key SRC_System Customer Name
1 5 SAP Jim
2 5 MYOB Joe
- Your ETL code from MYOB > DW knows to only look at MYOB data
- Your ETL code from SAP > DW knows to only look at SAP data
- Your data warehouse only uses the Customer_SK surrogate key
As your DW develops and new source systems are introduced you just keep adding SRC_Systems
You can put these in different columns as suggested in another answer but then you end up with this:
Customer_SK SRC_Key_SAP SRC_Key_MYOB Customer Name
1 5 NULL Jim
2 NULL 5 Joe
Which seems a bit wasteful and requires you to add a column everytime a new system comes online.
The important question is: does the same customer exist in both source systems? This design actually allows for merging across rows if they do.
Also make absolutely sure you put a unique constraint on SRC_Key
, SRC_System
as this aids in performance, ensures integrity, and self documents the key.