9

I am trying to use change tracking to copy data incrementally from a SQL Server to an Azure SQL Database. I followed the tutorial on Microsoft Azure documentation but I ran into some problems when implementing this for a large number of tables.

In the source part of the copy activity I can use a query that gives me a change table of all the records that are updated, inserted or deleted since the last change tracking version. This table will look something like

PersonID   Age    Name   SYS_CHANGE_OPERATION
---------------------------------------------
1          12     John   U
2          15     James  U
3          NULL   NULL   D
4          25     Jane   I

with PersonID being the primary key for this table.

The problem is that the copy activity can only append the data to the Azure SQL Database so when a record gets updated it gives an error because of a duplicate primary key. I can deal with this problem by letting the copy activity use a stored procedure that merges the data into the table on the Azure SQL Database, but the problem is that I have a large number of tables.

I would like the pre-copy script to delete the deleted and updated records on the Azure SQL Database, but I can't figure out how to do this. Do I need to create separate stored procedures and corresponding table types for each table that I want to copy or is there a way for the pre-copy script to delete records based on the change tracking table?

Martin Esteban Zurita
  • 3,161
  • 12
  • 23

2 Answers2

9

You have to use a LookUp activity before the Copy Activity. With that LookUp activity you can query the database so that you get the deleted and updated PersonIDs, preferably all in one field, separated by comma (so its easier to use in the pre-copy script). More information here: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity

Then you can do the following in your pre-copy script:

delete from TableName where PersonID in (@{activity('MyLookUp').output.firstRow.PersonIDs})

This way you will be deleting all the deleted or updated rows before inserting the new ones.

Hope this helped!

Martin Esteban Zurita
  • 3,161
  • 12
  • 23
0

In the meanwhile the Azure Data Factory provides the meta-data driven copy task. After going through the dialogue driven setup, a metadata table is created, which has one row for each dataset to be synchronized. I solved this UPSERT problem by adding a stored procedure as well as a table type for each dataset to be synchronized. Then I added the relevant information in the metadata table for each row like this

    {
            "preCopyScript": null,
            "tableOption": "autoCreate",
            "storedProcedure": "schemaname.UPSERT_SHOP_SP",
            "tableType": "schemaname.TABLE_TYPE_SHOP",
            "tableTypeParameterName": "shops"
    }

After that you need to adapt the sink properties of the copy task like this (stored procedure, table type, table type parameter name):

@json(item().CopySinkSettings).storedProcedure
@json(item().CopySinkSettings).tableType
@json(item().CopySinkSettings).tableTypeParameterName

If the destination table does not exist, you need to run the whole task once before adding the above variables, because auto-create of tables works only as long as no stored procedure is given in the sink properties.

Greg Holst
  • 874
  • 10
  • 23