The setup:
Azure Event Hub -> raw delta table -> agg1 delta table -> agg2 delta table
The data is processed by spark structured streaming.
Updates on target delta tables are done via foreachBatch
using merge
.
In the result I'm getting error:
java.lang.UnsupportedOperationException: Detected a data update (for example partKey=ap-2/part-00000-2ddcc5bf-a475-4606-82fc-e37019793b5a.c000.snappy.parquet) in the source table at version 2217. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to 'true'. If you would like the data update to be reflected, please restart this query with a fresh checkpoint directory.
Basically I'm not able to read the agg1 delta table via any kind of streaming. If I switch the last streaming from delta to memory I'm getting the same error message. With first streaming I don't have any problems.
Notes.
- Between aggregations I'm changing granuality: agg1 delta table (trunc date to minutes), agg2 delta table (trunc date to days).
- If I turn off all other streaming, the last one still doesn't work
- The agg2 delta table is new fresh table with no data