1

I am getting 'Event dropped, late by 5051 ms.'

How should I build my pipeline that all events are processed, regardless of their late arrival.

I have tried several approaches. Basically, what I tried was

  1. Without windowing where I didn't get late events, but this is not applicable for me due to parallel execution and values in sink get overriden instead of merged.
  2. Therefore I used windowing which solved my overriding problem, but caused late events.
  3. Next, I tried to use windowing without timestamp, which throwed exception that timestamp must be defined.

Basically I have 2 problems here: 1) how to merge new event to existing ones in sink 2) without dropping events or overriding.

Code:

WindowDefinition customWindow = WindowDefinition.sliding(60000, 30000);
customWindow.setEarlyResultsPeriod(1000);

StreamStage<Map.Entry<...>> updatedState = p
                .drawFrom(<source>)
                .withIngestionTimestamps()
                .groupingKey(...)
                .window(customWindow)
                .aggregate(AggregateOperations.toCollection(ArrayList::new))
                .mapUsingIMap(...)
                .sink(...)
Aliman
  • 43
  • 6
  • can you share your pipeline, what's your data source? When you add timestamps you can configure a lag which should determine how much late events you are tolerating. – Can Gencer Sep 18 '19 at 08:31
  • I added code. My data source is Hazelcast IMap. I don't add any sprecial timestamp. I just used ingestion timestamp. If I use lag, the problem is, that I would get results later as it needs to wait also lag time, right? Which is not acceptable for me. I can wait max 1-2 seconds. – Aliman Sep 18 '19 at 09:09
  • For merging results, you can use `Sinks.mapWithUpdating` for example. You can set lag to 1 to 2 sec, but if your events are late by 5s, then they will be dropped. You can use early results to get results earlier. – Can Gencer Sep 18 '19 at 14:53
  • Since you're using ingestion timestamps, it is indeed surprising that you see late events. Jet doesn't even allow you to configure allowed event lag in that case. You didn't specify what you mean by "event merging", but you should be able to use `Sinks.mapWithMerging` or `Sinks.mapWithUpdating` and implement your merging policy there. – Marko Topolnik Sep 19 '19 at 19:15
  • Yeah, I know about `Sinks.mapWithMerging` and `Sinks.mapWithUpdating`. Thanks for suggestion. My pipeline actually splits after my merge. One branch continues with updated value while other branch stores value back into sink. For simplicity, I omitted that part. But yeah, I agree that it's surprising to see late events on ingestion timestamps. It says late by 10s and more. Therefore, I'll try with lag of 1 min and early result. – Aliman Sep 20 '19 at 07:12
  • But I get very few such late events - 50 per several millions. – Aliman Sep 20 '19 at 07:21
  • You shouldn't increase the allowed lag because it means the whole pipeline will be delayed by one minute, awaiting any possible stragglers. If there's no event disorder in the input itself, and there can't be when you're using ingestion timestamps, there should be no late events. Let's find the root cause of those late events and fix that problem. I'll probably need more detail on your actual code. – Marko Topolnik Sep 20 '19 at 07:32
  • You can only get late events in case the system time goes back... – Oliv Sep 23 '19 at 13:45
  • We'll test this in the following days. I'll update progress. – Aliman Sep 24 '19 at 15:37
  • It seems ok so far. – Aliman Oct 04 '19 at 13:52
  • Getting late event again when using ingestion timestamp and tumbling window 1s. Resulting in: `currentWatermark=Watermark{ts=11:45:11.000}, event=KeyedWindowResult{start=11:42:08.000, end=11:42:09.000` I got this msg immediatelly, when I sent data into pipeline. How is it possible that timestamp of event and frame differes for 3min? – Aliman Oct 10 '19 at 09:51
  • Log timestamp is: `2019-10-10 11:45:10,667 INFO` Which means that frame timestamp (start, end) is 3 minutes behind. Full log: `2019-10-10 11:45:10,667 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz.jet-reporting.jet.cooperative.thread-0] [192.168.0.27]:5701 [jet-reporting] [3.1] Late event dropped. currentWatermark=Watermark{ts=11:45:11.000}, event=KeyedWindowResult{start=11:42:08.000, end=11:42:09.000, key='99998914', value='[InternalEventJournalMapEvent{eventType=4}]', isEarly=false} ` – Aliman Oct 10 '19 at 09:58
  • I tested a bit more ... continuing on gitgub: https://github.com/hazelcast/hazelcast-jet/issues/1685 – Aliman Oct 10 '19 at 11:12

0 Answers0