I have a stream that uses foreachBatch and keeps checkpoints in a data lake, but if I cancel the stream, it happens that the last write is not fully commited. Then the next time I start the stream I get duplicates, since it starts from the last commited batchId.
I use delta but I don't want to use the merge because I have a lot of data and it doesn't seem to be as performant as I would like (even using partitions).
How can I use the batchId to handle the duplicates? Or is there some other way?