1

I have a use case where I need to run the delta live table on a triggered mode and would like to know if we have any capabilities around checkpointing in triggered mode.

My source is a streaming one where data gets filled at second granularity and I would like to run a DLT pipeline on a triggered mode for every 24 hours and pull latest data from it.

When i set the mode to streaming, i could see the checkpoints being created but couldn't find a way to set check points for triggered mode.

Can we have incremental load functionality in triggered mode ?

Shane
  • 588
  • 6
  • 20

1 Answers1

2

If you defined your tables as a streaming live table, then the checkpoint will be created even in the triggered mode, so check that your tables are streaming.

Here is an example of the checkpoint for a DLT table that runs every day in the triggered mode, consuming data from the EventHubs - as you can see it has all necessary objects - commits, offsets, etc. DLT will create it automatically for you if you define it as create streaming live table or use spark.readStream or use dlt.read_stream to access the data.

enter image description here

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks @Alex, If i have a situation where i will be specifying the timestamp to read the data from source table, lets say - read from Time T1 to T2 for every 24 hours, then which will take priority - offsets or the timestamp i specified to read the data from streaming source table ? – Shane Mar 22 '23 at 03:38
  • 2
    offsets/timestamps are used only when stream is starting - after that spark structured streaming will use offsets from the checkpoint to resume operations – Alex Ott Mar 22 '23 at 07:57