0

I have a pipeline that daily extracts newly updated data from an SQLDB table into a .tsv file in Data Lake Store.

That data will be used later to append/update to an existing table.

To select only the newly updated rows from SQLDB, I used a sqlReaderQuery to select from a table with a WHERE clause ("WHERE DATEDIFF(day,c.UpdatedOn,GETDATE())=1") at the end which allows me to select only rows that have been updated (c.UpdatedOn) one day before the execution of the pipeline.

The problem is: Imagine that the pipeline is paused for 1 week for maintenance. Then, when I resume the pipeline, it will only retrieve the rows added in the last day of the previous week...

Is there any way I can use a system variable that indicates the last time the pipeline ran?

So I could replace the GETDATE() in the WHERE clause for the system variable, for example:

WHERE DATEDIFF(day,c.UpdatedOn,DateOfPipelineLastExecution)<0

Thank you

Tomás Law
  • 141
  • 1
  • 1
  • 6

2 Answers2

0

Each time your activity runs it runs with respect to a specific window of time, and you should use the SliceStart and SliceEnd variables in your SQL Query to extract the data for that window. See https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-scheduling-and-execution

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • So what you are saying is that since the pipeline was paused during a week, when I resume the pipeline, the data slices of that week will be "resumed". So the SliceStart value used in the SQL Query will correspond to the dates of that previous week, right? – Tomás Law Dec 29 '17 at 14:42
  • I think so. At least that's how it works if an activity instance fails. – David Browne - Microsoft Dec 29 '17 at 15:35
0

https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-functions-variables#data-factory-system-variables

As David says, you should use SliceStart and SliceEnd variables so that each slice gets you the information you need for everyday, no matter what day its executed on. For the example you gave, if you stop the pipeline for 1 week and then resume it, the pipeline will copy 7 slices corresponding to each day.

Your where clause should look something like this:

$$Text.Format( 'SELECT ... WHERE DATEDIFF(day,c.UpdatedOn, \\'{0:yyyyMMdd-HH}\\'')=1, SliceEnd)
Martin Esteban Zurita
  • 3,161
  • 12
  • 23
  • You mean this, right? (the last ' should be after the ....=1) $$Text.Format( 'SELECT ... WHERE DATEDIFF(day,c.UpdatedOn, \\'{0:yyyyMMdd-HH}\\')=1', SliceEnd) By the way, being c.UpdatedOn a datetime value, if I want to check if c.UpdatedOn is after the SliceEnd can I use: $$Text.Format( 'SELECT ... WHERE c.UpdatedOn-\\'{0:yyyy-MM-dd HH:mm:ss}\\')>0', SliceEnd) – Tomás Law Dec 29 '17 at 15:30