1

I am working on a task where my source is AWS RDS - SQL Server and my target is Azure SQL Server. There's a table with 80M records in my source that needs to be merged with my target table. This merging will happen every 15 mins and based on the business key, I need to -

  1. Update the target table if the key is updated in the source table.
  2. Insert a new key into the target table.
  3. Mark isDeleted as true in the target if the key is no more present in the source.

IMP Note - The source row is hard-deleted and no history is maintained.

Since this merging happens every 15 mins and the source table is pretty big, I use lastUpdated column to select only limited records in the source query of my merge query.

With this, I am able to perfectly handle the "upsert" scenario, but on delete, it is deleting all the records from the target which is not desirable.

I have tried the below option -

Read the entire source table in a temp_table every 15 mins and then perform merge from temp_table to the target table. But this is very costly in terms of processing and time.

Is there any better way to handle this scenario? I am happy to share more information as needed.

Ken White
  • 123,280
  • 14
  • 225
  • 444
Sanket Kelkar
  • 129
  • 2
  • 9

1 Answers1

0

I think you can solve the problem by adding new column called SyncStamp, the idea is, we update or insert the same value for SyncStamp, So the other rows that have not this value should be updated as IsDeleted.

I prefer to get the actual timestamp for SyncStamp but you can choose random numbers.

    --get timestamp 
    Declare @SyncStamp bigint = (SELECT DATEDIFF_BIG(Second, '1970-01-01 00:00:00', GETUTCDATE()))

    MERGE TargetTable AS Target
    USING SourceTable   AS Source
    ON Source.BusinessKey = Target.BusinessKey
    
    -- For Inserts
    WHEN NOT MATCHED BY Target THEN
        INSERT (ProductID,ProductName, SyncStamp) 
        VALUES (Source.ProductID,Source.ProductName, @SyncStamp)
    
    -- For Updates
    WHEN MATCHED THEN UPDATE SET
        Target.ProductName  = Source.ProductName,
        Target.SyncStamp        = @SyncStamp;
        

    --Update isDeleted
    UPDATE TargetTable
    SET IsDeleted= 1
    Where IsDeleted=0 and SyncStamp <> @SyncStamp
    
sa-es-ir
  • 3,722
  • 2
  • 13
  • 31
  • Hi @sa-es-ir , this will work only if we read the entire source data every time correct? But reading a table with 80M records every 15 mins is costly. – Sanket Kelkar Jan 04 '23 at 09:40
  • @SanketKelkar You can create index on SyncStamp with filter IsDeleted=0 and it will be fast, there is no way to do your case with Merge query. – sa-es-ir Jan 04 '23 at 10:52