0

I have a source data which is close to 1tb and want to load that into synapse sql .however full-load will take time and is not efficient for larger dataset , if i go with incremental approach then in watermark table what should be the timestamp in the beginning ??should i give source data start date in watermark table ??

im trying to create a logic to incrementally load one month data but failing for the logic to give the date range

1 Answers1

0

1 TB is not something that will take a long time, depending on your cost-goals and how much scale you want to use (in short, depends on the compute both on the source and sink side). There is no such thing as full-load being inefficient for 1 TB sized dataset.

And it is not exactly delta load what you mean there, delta load is used for periodically updating after the full load is done. So in your case you want to partition your full load into several steps.

For this you can analyze the timestamps on the data, lets say it goes from 2016 to 2019. Then you can break it up in 4 years and do the load in 4 different runs. First load the timestamps with 2016, then 2017, so on..

You need to provide more information how the logic is failing for the date range, what exact method are you using to load the dataset?

  • Thanks ziya well currently in my dataset from source I have lastmidified date column with which I can filter dates and will be creating watermark table to get the last updated date...however in copy activity to load one year data incrementally what should be the logic – HUSNA BANU Aug 23 '23 at 18:12
  • you can set up a delta load using watermarking. I have an example here with lookups: https://github.com/zmkarakas/IncrementalLoadADF . What is your source again? – Ziya Mert Karakas Aug 24 '23 at 19:35