In the case when state is timing-out (state.isTimingout() == true
) then the function again update the sate which may cause an exception.
Yes, that is correct. If you set an explicit timeout on mapWithState
and call state.update
while the state is in the last timing out iteration that would cause an exception to be thrown as you cannot update the state once a timeout has occurred. This is explicitly stated in the documentation:
State cannot be updated if it has been already removed (that is,
remove() has already been called) or it is going to be removed due to
timeout (that is, isTimingOut()
is true).
In your example, an additional check is in order:
def trackStateFunc(batchTime: Time,
key: String,
value: Option[Int],
state: State[Long]): Option[(String, Long)] = {
val sum = value.getOrElse(0).toLong + state.getOption.getOrElse(0L)
val output = (key, sum)
if (!state.isTimingOut) state.update(sum)
Some(output)
}
Or, since value
should only be None
once a timeout occurs, you can use pattern matching as well:
def trackStateFunc(batchTime: Time,
key: String,
value: Option[Int],
state: State[Long]): Option[(String, Long)] = {
value match {
case Some(v) =>
val sum = v.toLong + state.getOption.getOrElse(0L)
state.update(sum)
Some((key, sum))
case _ if state.isTimingOut() => (key, state.getOption.getOrElse(0L))
}
}
For a review of stateful streaming, see this blog post (disclaimer: I am the author).