0

Thanks in advance for looking at this.

I have prepared a data warehouse and am populating a customer table with customer information from 2 systems. Customers from one system have the same Business Key as Customers in the other system.

What would be the best way to distinguish between the two so I don't update records don't want to and maintain good data integrity.

I wondered about a system flag but I am unsure.

All suggestions/questions welcome.

Will
  • 228
  • 1
  • 2
  • 15
  • 2
    Add an additional field to your primary key or use a prefix representing the old systems – sqlab Jun 06 '16 at 11:36

3 Answers3

0

I personally recommend an IDENTITY column that will serve as a surrogate key.

Tab Alleman
  • 31,483
  • 7
  • 36
  • 52
0

As Tab Alleman said, use an identity column for your primary key. Then, when I've had this situation before, I keep my two business keys in separate columns, and update based on whichever key is relevant in each situation.

This way, you can also match up users who use multiple systems and see where there are overlaps.

Jeff
  • 12,555
  • 5
  • 33
  • 60
0

I agree with the comment above - the 'source system' key is a composite key from the source system + another string or int identifying the actual source system. This is seperate to the surrogate key mentioned in the previous answers. You really have two keys in your dimension. One is the standard IDENTITY surrogate key - no suprises there.

The other is a composite key consisting of the key from the source system and an identifier (I actually usually just use a string) that tells you which system it comes from.

So your dimension looks like this:

Customer_SK    SRC_Key    SRC_System    Customer Name
1              5          SAP           Jim       
2              5          MYOB          Joe
  • Your ETL code from MYOB > DW knows to only look at MYOB data
  • Your ETL code from SAP > DW knows to only look at SAP data
  • Your data warehouse only uses the Customer_SK surrogate key

As your DW develops and new source systems are introduced you just keep adding SRC_Systems

You can put these in different columns as suggested in another answer but then you end up with this:

Customer_SK    SRC_Key_SAP     SRC_Key_MYOB    Customer Name
1              5               NULL            Jim       
2              NULL            5               Joe

Which seems a bit wasteful and requires you to add a column everytime a new system comes online.

The important question is: does the same customer exist in both source systems? This design actually allows for merging across rows if they do.

Also make absolutely sure you put a unique constraint on SRC_Key, SRC_System as this aids in performance, ensures integrity, and self documents the key.

Nick.Mc
  • 18,304
  • 6
  • 61
  • 91