Check pointing for delta live table using triggered mode

Question

I have a use case where I need to run the delta live table on a triggered mode and would like to know if we have any capabilities around checkpointing in triggered mode.

My source is a streaming one where data gets filled at second granularity and I would like to run a DLT pipeline on a triggered mode for every 24 hours and pull latest data from it.

When i set the mode to streaming, i could see the checkpoints being created but couldn't find a way to set check points for triggered mode.

Can we have incremental load functionality in triggered mode ?

score 2 · Accepted Answer · answered Mar 21 '23 at 18:38

2

If you defined your tables as a streaming live table, then the checkpoint will be created even in the triggered mode, so check that your tables are streaming.

Here is an example of the checkpoint for a DLT table that runs every day in the triggered mode, consuming data from the EventHubs - as you can see it has all necessary objects - commits, offsets, etc. DLT will create it automatically for you if you define it as create streaming live table or use spark.readStream or use dlt.read_stream to access the data.

answered Mar 21 '23 at 18:38

Alex Ott

80,552
8
87
132

Thanks @Alex, If i have a situation where i will be specifying the timestamp to read the data from source table, lets say - read from Time T1 to T2 for every 24 hours, then which will take priority - offsets or the timestamp i specified to read the data from streaming source table ? – Shane Mar 22 '23 at 03:38
2

offsets/timestamps are used only when stream is starting - after that spark structured streaming will use offsets from the checkpoint to resume operations – Alex Ott Mar 22 '23 at 07:57

Check pointing for delta live table using triggered mode

1 Answers1