I need to guarantee exactly once series of writes in a foreachbatch. For example I've:
- a stream with two writes in HBase and one on HDFS
- two writes on HDFS in different folders
I want to write all only when I'm sure that every operation will pass, is there a way to do it in a transactional way? Because from my understanding the operations are executed sequentially in foreachbatch, and the checkpoint is updated only when all operations in foreachbatch has been completed. If the first write goes well and the second goes in error, I want to rollback the first writing.
Is this possible anyway? At least on the second case when I'm performing two writes in same sink,
P.S I can't explicitly do check on duplicates beacause I'm writing a generic interface,I've thought that persist anywhere something like process-batchId-operationId would be helpful, but since is not atomic with write operation it can fail and also will be a lot of overhead