0

I am developing a new data warehouse and my source tables for the employee dimension gets truncated every day and reloaded with all history and updates,deletes and new inserts.

The columns which tracks these changes are effective date & effective sequence.We also have a audit table which helps us determine which records are updated,inserted and deleted every day by comparing table from today & previous day.

My question is to how can I do a incremental load on the table in my staging layer so the surrogate key which is a identity columns remains same.If I do a truncate on my final dimension then I get new surrogate key each time I truncate and hence it mess up my fact table.

XOXO
  • 1
  • 3
  • Please help!!!.I found a way to keep track of the identity column DBCC CHECKIDENT ('.dimEmployee' , RESEED, 1) .Does this approach have any loop holes or scenarios which it does not captures.I will also truncate and reload the fact table which will use the employee key. – XOXO Apr 09 '19 at 18:16
  • Having an identity column as a PK all the time should not be a general approach and at design phase we must think it through. These are the type of issues where it would cause problems. This is a general problem when you want the abiluty to reload. You should keep PK of source or if it is a matter or rolling dimension you can make it a multi part key – Saad Ahmad Apr 09 '19 at 18:18

2 Answers2

0

Truncating a dimension is never a good idea. You will lost the ability to keep track of the primary keys, which will be referenced by the fact table.

If you must truncate the dimension everyday, then you shouldn't have auto-increment keys. Instead, you should compare the previous state of the dimension with the new state, and lookup the key values so that they can be kept.

Example: your dim has 2 entries, employee A and employee B with keys 1 and 2 resp. Next day, employee A is updated to AA and employee C is added. You should be able to compare this new data set with the old one, so that AA still has key 1, B is kept with key 2 and C is added with key 3. Of course you can't rely on auto-increment keys, and must set them from what was there previously

Also, beware of deletes: just because an employee is deleted that doesn't mean the facts pertaining to that employee also disappear. Don't delete the record from the fact table, instead add a "deleted" flag and set it to Y for deleted records. In your reporting, filter out those deleted employees, so you report only on non deleted ones.

But, the best scenario is always to not truncate the table, and instead perform the necessary updates in the dimension, keeping the primary keys (which should be synthetic and not coming from the source system anyway) and any attributes that didn't change, marking as deleted those that were deleted from the source system, and updating the version numbers, validity dates, etc. accordingly.

Your problem seems to be very close to what Kimball describes as a Type II Slowly Changing Dimension and your ETL should be able to handle that.

nsousa
  • 4,448
  • 1
  • 10
  • 15
0

Table truncation on the source wouldn't represent a real issue as long as you have a business key to uniquely identify one employee. If so, the best way to address your requirement, is that to handle your employee dimension as a type 2 SCD.

Typically ETL software provide components to manage SCD. Nevertheless, a way to handle SCD may consist in defining a hash based on the attributes you want to track. Then if for a given business key you notice that the new hash calculated on the source differs from the hash you stored in your dimension, you will update all the attributes for that record.

Hope this helps.