1

I have 2 tables in mssql server.I can perform scd through custom insert/update/delete and also through Merge statement.

I want to know that is there any generic procedure that could server the purpose. we just pass it 2 tables and it should porform the SCD. any option in SQL server 2008? Thanks

1 Answers1

1

No, there isn't and there can't be a generic one suitable for no matter what tables you pass to it. For several reasons:

  • How do you know which SCD type? (Okay, could be another parameter, but...)
  • How do you know which column should be historicized and which should be overwritten?
  • How do you determine which column is the business key, the surrogate key, the expiration column and so on?
  • To specify the columns in an update statement you must write dynamic sql, which is possible, but the above point comes into play

Not a reason why it's not possible but also consider: For a proper UPSERT one usually works with temporary tables, the MERGE statement sucks for SCDs except in special cases. That is because you can't use a MERGE statement together with an INSERT/UPDATE and you would have to disable foreign keys for that, since an UPDATE is implemented as DELETE THEN INSERT (or something like that, don't remember clearly, but I had those problems when I tried).

I prefer doing it this way (SCD type 2 and SQL Server that is):

Step 1:

IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimSource')
DROP TABLE tmpDimSource;
SELECT
*
INTO tmpDimSource
FROM
(
SELECT whatever
FROM yourTable
);

Step 2:

IF EXISTS (
SELECT * FROM sys.objects
WHERE name = 'tmpDimYourDimensionName')
DROP TABLE tmpDimYourDimensionName;

SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
INSERT INTO tmpDimYourDimensionName 
(
sid, /*a surrogate id column*/
theColumnsYouNeedInYourDimension,
validFrom
)
SELECT 
ISNULL(d.sid, 0),
ds.theColumnsYouNeedInYourDimension,
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
FROM
tmpDimSource ds 
LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
;

The ISNULL(d.sid, 0) in step 2 is important. It returns the surrogate id of your dimension, if an entry already exists, otherwise 0.

Step 3:

UPDATE D_yourDimensionName SET 
validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
FROM 
D_yourDimensionName d
INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
WHERE t.sid <> 0 AND
(
d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
d.anotherColumn <> t.anotherColumn 
)
;

In Step 3 you mark the existing entry as not valid anymore and keep a history of it. The valid entry you get with WHERE validTo IS NULL.

You can also add another UPDATE to overwrite any other column with the new value if needed.

Step 4:

INSERT INTO D_yourDimensionName 
SELECT * FROM tmpDimYourDimensionName 
WHERE sid = 0;

And that's it.

fancyPants
  • 50,732
  • 33
  • 89
  • 96
  • yes sir through insert update i already know that.. I was wondering if can do something through merge statement.. –  Jan 29 '13 at 16:04
  • 1
    Yes, you can, like I said, but you have to disable foreign keys (usually from your facts table if it's a star schema) and it depends on what type of SCD you want to implement. And there is no reasonable way you can write a procedure that can be applied to no matter what tables you throw at it. You can of course write a procedure for every dimension. Feel free. – fancyPants Jan 29 '13 at 16:15