2

I am following a sample of mapWithState function on Databricks website.

The codes for trackstatefunction is as follow:

def trackStateFunc(batchTime: Time, key: String, value: Option[Int], state: State[Long]): Option[(String, Long)] = {
  val sum = value.getOrElse(0).toLong + state.getOption.getOrElse(0L)
  val output = (key, sum)
  state.update(sum)
  Some(output)
}

I had a question in the case when state is timing-out (state.isTimingout()==true) then the function again update the sate which may cause an exception. Is this true for the sample?

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
mahdi62
  • 959
  • 2
  • 11
  • 17

1 Answers1

4

In the case when state is timing-out (state.isTimingout() == true) then the function again update the sate which may cause an exception.

Yes, that is correct. If you set an explicit timeout on mapWithState and call state.update while the state is in the last timing out iteration that would cause an exception to be thrown as you cannot update the state once a timeout has occurred. This is explicitly stated in the documentation:

State cannot be updated if it has been already removed (that is, remove() has already been called) or it is going to be removed due to timeout (that is, isTimingOut() is true).


In your example, an additional check is in order:

def trackStateFunc(batchTime: Time, 
                   key: String, 
                   value: Option[Int], 
                   state: State[Long]): Option[(String, Long)] = {
  val sum = value.getOrElse(0).toLong + state.getOption.getOrElse(0L)
  val output = (key, sum)
  if (!state.isTimingOut) state.update(sum)
  Some(output)
}

Or, since value should only be None once a timeout occurs, you can use pattern matching as well:

def trackStateFunc(batchTime: Time, 
                   key: String, 
                   value: Option[Int], 
                   state: State[Long]): Option[(String, Long)] = {
  value match {
    case Some(v) => 
      val sum = v.toLong + state.getOption.getOrElse(0L)
      state.update(sum)
      Some((key, sum))
    case _ if state.isTimingOut() => (key, state.getOption.getOrElse(0L))
  }
}

For a review of stateful streaming, see this blog post (disclaimer: I am the author).

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321