I am getting 'Event dropped, late by 5051 ms.'
How should I build my pipeline that all events are processed, regardless of their late arrival.
I have tried several approaches. Basically, what I tried was
- Without windowing where I didn't get late events, but this is not applicable for me due to parallel execution and values in sink get overriden instead of merged.
- Therefore I used windowing which solved my overriding problem, but caused late events.
- Next, I tried to use windowing without timestamp, which throwed exception that timestamp must be defined.
Basically I have 2 problems here: 1) how to merge new event to existing ones in sink 2) without dropping events or overriding.
Code:
WindowDefinition customWindow = WindowDefinition.sliding(60000, 30000);
customWindow.setEarlyResultsPeriod(1000);
StreamStage<Map.Entry<...>> updatedState = p
.drawFrom(<source>)
.withIngestionTimestamps()
.groupingKey(...)
.window(customWindow)
.aggregate(AggregateOperations.toCollection(ArrayList::new))
.mapUsingIMap(...)
.sink(...)