1

My question this time is: working with a MapState, it is safe to use mapstate.put(key, value) to modify the current value for the key into the mapState or I need to do mapState.remove(key) and after that do mapstate.put(key, value) again or there is anyway to update this value?

Starting from Flink's state abstractions are not designed for concurrent accesses and should not be shared between multiple threads. So then, reformulating my question: can I update the value according to a key into a mapState without remove the key and then put the key again? and how can I avoid the ConcurrentModificationException using mapState without set the parallelism to 1 for this operator?

because I'm having this exception:

java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
at java.util.HashMap$EntryIterator.next(HashMap.java:1479)
at java.util.HashMap$EntryIterator.next(HashMap.java:1477)
at org.apache.flink.api.common.typeutils.base.MapSerializer.copy(MapSerializer.java:111)
at org.apache.flink.api.common.typeutils.base.MapSerializer.copy(MapSerializer.java:49)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:287)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:311)
at org.apache.flink.runtime.state.heap.HeapMapState.get(HeapMapState.java:85)
at org.apache.flink.runtime.state.ttl.TtlMapState.lambda$getWrapped$0(TtlMapState.java:63)
at org.apache.flink.runtime.state.ttl.AbstractTtlDecorator.getWrappedWithTtlCheckAndUpdate(AbstractTtlDecorator.java:92)
at org.apache.flink.runtime.state.ttl.TtlMapState.getWrapped(TtlMapState.java:62)
at org.apache.flink.runtime.state.ttl.TtlMapState.contains(TtlMapState.java:92)
at org.apache.flink.runtime.state.UserFacingMapState.contains(UserFacingMapState.java:72)
at com.teavaro.cep.transformations.SessionUseCase$1.generateSessionRecord(SessionUseCase.java:65)
at com.teavaro.cep.transformations.SessionUseCase$1.generateSessionRecord(SessionUseCase.java:42)
at com.teavaro.cep.operators.SessionIdentificationProcessFunction.process(SessionIdentificationProcessFunction.java:25)
at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:50)
at org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:32)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:546)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onEventTime(WindowOperator.java:454)
at org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.advanceWatermark(InternalTimerServiceImpl.java:251)
at org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:128)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:774)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262)
at org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
at org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)

Thanks a lot. Kind regards.

Alter
  • 903
  • 1
  • 11
  • 27
  • You must be doing something unusual to have run into this. But to answer your question, it's fine to simply call put to do an update. No need to remove an existing value first. – David Anderson Apr 16 '20 at 15:35
  • Also, mapState is intended for use in operators where the parallelism is higher than 1. The restriction on concurrent access is within a given instance of the operator. – David Anderson Apr 16 '20 at 15:37
  • Thank you so much again David, I'm brand new with Flink and Stream Processing, so sorry if my questions are out from usual but I'm asking because I don't find the answer in another places. By the way mapState is working, thanks a lot for everything. – Alter Apr 16 '20 at 15:55

1 Answers1

4

It's fine to simply call put to update an entry in MapState. No need to remove an existing value first.

David Anderson
  • 39,434
  • 4
  • 33
  • 60