0

Let's say that I have a delta table saved which was processed using the ForEachBatch to apply transformations and finally saved a final delta table (let's call these table Table1).

However for some requeriments the data of this table need to be merged or appended to another delta table (Table2) which is being updated by another stream.

My question here is how can I use the ForEach option instead of the ForEachBatch in the new streaming to save that data in the Table2? Considering that for requeriments we need to do append the data of the Table1 to the Table2 record by record 'cause using the option ForEachBatch when the process fail it generates duplicate data and ends breaking up the streaming?

Or is there another way to aproach the problem not using it?

It is important to consider that each table is an streaming table.

We have tried to implement these idea using two stream writting using the ForEachBatch however we have got error and duplicates in different scenarios. First due the fact that we need to use a surrogate key (an indentity) it makes the two streamings to fail.

We avoided it doing one staging table without the identity and later applying it, but the problem is that in some moments if the stream fail using the foreachbatch generates duplicated data and breaks the whole process.

That's why we tought that we could use the foreach to append the data to the table2 but we have no idea how it works and how implement it because it must be record to record and we havent' find an example or anything about how to implement it.

So any help would be appreciate.

If code it's needed, I could try to provide it.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Alex
  • 11
  • 2

0 Answers0