A common way of syncing SQL databases in Azure is by activating replication. This will keep databases in sync for the purpose of high availability or redundancy. For your use case, this is not an option, since you only want to sync specific rows.
The recommended tool to use here is Azure Data Factory, which has a built-in copy activity. This allows you to copy data from a source to a sink. In your case, both source and sink are SQL databases.To get started with the copy activity, follow the instructions from this tutorial. You should also make use of a tumbling window trigger to run the pipeline at a specific interval.
Below you'll find a very simple example of a copy activity to give you an idea. The sqlReaderQuery
property gives you a lot of flexibility since it enables you to select which specific data you want to sync. To sync the data incrementally, you'll have to make use of existing fields in your dataset like LastUpdated
. This way you can conditionally load the data that was created after the last time the pipeline ran.
"activities":[
{
"name": "CopyFromSQLServer",
"type": "Copy",
"inputs": [
{
"referenceName": "<SQL Server input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "SELECT * FROM Customers WHERE LastUpdated BETWEEN @{pipeline().parameters.LastModified_From} AND @{pipeline().parameters.LastModified_To}"
},
"sink": {
"type": "SqlSink"
}
}
}
]
This idea is based on this azure tutorial, which copies blob files between two storage containers using the LastModifiedDate. In this GitHub repository you'll find the terraform implementation.