2

I've implemented four SCDs for the warehouse I'm working on and they're working a treat. Hopefully I've understood SCD methods correctly and assuming I have, I now have a bit of an issue I'd like help with.

The issue is that there are fields that can be updated without needing to make a new dimension as they're not time-relevant. So in the MERGE statement I'd like to have two statements:

WHEN MATCHED AND tgt.HashBytes_Value <> src.HashBytes_Value
THEN 
   UPDATE Row_Expiry_Details
WHEN MATCHED AND tgt.HashBytes_Value = src.HashBytes_Value
THEN
   UPDATE Columns_That_Are_Not_SCD_Relevant
etc..

I'm not sure how to add a second WHEN MATCHED to the query.

Here is an example of the code that runs fine now, but doesn't do what I want.

-- Created temp tables for example
DROP TABLE IF EXISTS #StagingTable
CREATE TABLE #StagingTable
(
    id INT,   
    name VARCHAR(20),
    team VARCHAR(20),
    preferences VARCHAR(50),
    hashbytes_Value VARBINARY(20)
)

DROP TABLE IF EXISTS #PresTable

CREATE TABLE #PresTable
(
    dim_id INT IDENTITY(1,1) PRIMARY KEY,
    id INT,
    name VARCHAR(20),
    team VARCHAR(20),   
    preferences VARCHAR(50),
    hashbytes_Value VARBINARY(20),
    row_Current_ind BIT DEFAULT(1),
    row_effective_date DATETIME2 DEFAULT(SYSDATETIME()),
    row_expiry_date DATETIME2 DEFAULT(SYSDATETIME())
)

-- Load Staging
insert into #StagingTable (id, name, team, preferences, hashbytes_Value)
select id = 1, name = 'archibald', team = 'team 1', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'archibald' for xml raw))
union select id = 2, name = 'dave',  team = 'team 1', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'dave' for xml raw))
union select id = 3, name = 'peter', team = 'team 2', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'peter' for xml raw))
union select id = 4, name = 'roger', team = 'team 2', preferences = 'Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'roger' for xml raw))

-- scd merge 
insert into #PresTable (id, name, team, preferences, hashbytes_Value)
select id, name, team, preferences, hashbytes_Value 
from
(
merge into #PresTable as tgt
using
(
select
    id
    ,name
    ,team
    ,preferences
    ,hashbytes_value
from
    #StagingTable
) as src
-- alias
(id, name, team, preferences, hashbytes_Value)
on src.id = tgt.id
-- must be most recent row
and tgt.row_current_ind = 1
when matched
and tgt.hashbytes_value <> src.hashbytes_value
then
update 
set 
    tgt.row_current_ind = 0
    ,row_expiry_date = sysdatetime()
when not matched
then 
insert
(
    id
    ,name
    ,team
    ,preferences
    ,hashbytes_Value
)
values
(
    src.id
    ,src.name
    ,src.team
    ,src.preferences
    ,src.hashbytes_value
)
output
    $action
    ,src.id
    ,src.name
    ,src.team
    ,src.preferences
    ,src.hashbytes_value
)
as changes
(
    action
    ,id
    ,name
    ,team
    ,preferences
    ,hashbytes_Value
) where action = 'update'

truncate table #StagingTable
insert into #StagingTable (id, name, team, preferences, hashbytes_Value)
select id = 1, name = 'archibald', team = 'team 1', preferences = 'Updated Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'archibald' for xml raw))
union select id = 3, name = 'peter', team = 'team 1', preferences = 'Updated Throwaway String',  hashbytes_value = hashbytes('sha1', (select team = 'team 1', name = 'peter' for xml raw))
union select id = 5, name = 'russell', team = 'team 2', preferences = 'Updated Throwaway String', hashbytes_value = hashbytes('sha1', (select team = 'team 2', name = 'roger' for xml raw))


-- scd merge 
insert into #PresTable (id, name, team, preferences, hashbytes_Value)
select id, name, team, preferences, hashbytes_Value 
from
(
merge into #PresTable as tgt
using
(
select
    id
    ,name
    ,team
    ,preferences
    ,hashbytes_value
from
    #StagingTable
) as src
-- alias
(id, name, team, preferences, hashbytes_Value)
on src.id = tgt.id
-- must be most recent row
and tgt.row_current_ind = 1
when matched
and tgt.hashbytes_value <> src.hashbytes_value
then
update 
set 
    tgt.row_current_ind = 0
    ,row_expiry_date = sysdatetime()
when not matched
then 
insert
(
    id
    ,name
    ,team
    ,preferences
    ,hashbytes_Value
)
values
(
    src.id
    ,src.name
    ,src.team
    ,src.preferences
    ,src.hashbytes_value
)
output
    $action
    ,src.id
    ,src.name
    ,src.team
    ,src.preferences
    ,src.hashbytes_value
)
as changes
(
    action
    ,id
    ,name
    ,team
    ,preferences
    ,hashbytes_Value
) where action = 'update'




select * from #PresTable

That final output should be as you see it BUT preferences should be 'Updated Throwaway String' where id = 1.

Is this possible in the same merge statement? Or do I need an update before/after I run the merge statement?

Post was edited by somebody, but this is the example I want to see in the end:



UPDATE TGT
SET
    TGT.preferences = SRC.preferences
FROM 
    #PresTable TGT
INNER JOIN 
    #StagingTable SRC
    ON SRC.id = TGT.id
WHERE 1=1
    AND TGT.hashbytes_Value = SRC.hashbytes_Value

select * from #PresTable

Thank you.

Jeff Scott
  • 105
  • 5

1 Answers1

1

The MERGE docs says (emphasis mine):

WHEN MATCHED THEN merge_matched

Specifies that all rows of *target_table, which match the rows returned by ON , and satisfy any additional search condition, are either updated or deleted according to the clause.

The MERGE statement can have, at most, two WHEN MATCHED clauses. If two clauses are specified, the first clause must be accompanied by an AND clause. For any given row, the second WHEN MATCHED clause is only applied if the first isn't. If there are two WHEN MATCHED clauses, one must specify an UPDATE action and one must specify a DELETE action. When UPDATE is specified in the clause, and more than one row of matches a row in target_table based on , SQL Server returns an error. The MERGE statement can't update the same row more than once, or update and delete the same row.

So you can't issue 2 different UPDATE on the match condition. What you can do is use 1 UPDATE and conditionally update the columns to the new value or their current value if your condition doesn't apply:

WHEN MATCHED THEN UPDATE TableName SET
    Column1 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN UpdatedColumn1 ELSE Column1 END,
    Column2 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN UpdatedColumn2 ELSE Column2 END,
    Column3 = CASE WHEN tgt.HashBytes_Value <> src.HashBytes_Value THEN Column3 ELSE UpdatedColumn3 END

Keep in mind that triggers will be called for rows that might not actually get "updated".

EzLo
  • 13,780
  • 10
  • 33
  • 38
  • Thanks for the reply. This sort of works for me, the issue is that I OUTPUT the UPDATE step to INSERT a new row to the SCD - I'm not sure how I can filter out the values where HashBytes_Value is equal. – Jeff Scott Aug 16 '19 at 10:55
  • 1
    You can't. Either insert and delete it afterwards (which sucks) or remove the merge statement and do the operations in multiple statements, with a transaction. When merges start to get big or complex, I always find it better to split them. – EzLo Aug 16 '19 at 10:59
  • Ok, thank you. I'll add an update statement after the merge and wrap both in a transaction. Question though, if I wrap it in a transaction will I stop the PowerBI instance from being able to read the table at all with a tranaction lock until commited? – Jeff Scott Aug 16 '19 at 11:06
  • Plus one for "do the operations in multiple statements with a transaction." I've shied away from `MERGE` since the first time I read [Use Caution with SQL Server's MERGE Statement](https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/) from Mr. Bertrand. – Eric Brandt Aug 16 '19 at 13:29