1

Question

I seem to be observing a scenario in which the messages stashed for a typed supervised actor are lost during a restart, using the Akka backoff supervision strategy.

Is this expected behavior? If not, how can I implement ensure these stashed messages are retained?

The Setup

I create a typed supervised actor with a stash

    BackoffSupervisorStrategy backoff = SupervisorStrategy
        .restartWithBackoff(Duration.ofMillis(10), Duration.ofMillis(10000), 0)
        .withStashCapacity(2000);
    return Behaviors.supervise(Behaviors.setup(MyActor::new)).onFailure(Throwable.class, backoff);

It handles a command ForceFail which results in a RuntimeException so that we can let the Akka supervisor do its thing.

  private Behavior<Command> forceFail(ForceFail command) {
    getContext().getLog().info("Got fail command: {}", command.name);
    throw new RuntimeException(command.name);
  }

After spawning the actor, I send a series of tells

testSystem.tell(new ForceFail("first fail"));
testSystem.tell(new ForceFail("second fail"));
testSystem.tell(new ForceFail("third fail"));

Each tell results in an exception in the actor, triggering a restart by the supervisor. I check the size of the StashBuffer right before the supervisor unstashes the messages during a restart.

What I see is that during the first restart, the StashBuffer shows a size of 2, as expected. However, during the second restart for the second message, the size is 0, where I would expect it to be 1.

stash size

I do not see the last message get sent to the dead letter actor. It seems to be lost, with no logging describing what happens to it.

Notes

I see in the Akka internal code, the StashBuffer unstashAll() method is called. As written in the javadocs:

If an exception is thrown by processing a message a proceeding messages and the message causing the exception have been removed from the StashBuffer, but unprocessed messages remain.

The wording seems a bit funny, but what it's saying is that it will sequentially process the messages in the stash until it processes all of them or we hit an exception. Unhandled messages remain in the stash. This does not seem to be what I'm observing though.

I'm using Akka 2.7.0.

havenwang
  • 153
  • 9
  • Akka Persistence is built exacty for this. Use Akka persistence. – sarveshseri Jan 10 '23 at 08:06
  • This happens because the mailboxes are generally owned by the actor itself. So, when the actor dies... the mailbox goes with it. One simpler workaround is to introduce a router... and then add your actor to it. Once you do this, the mailbox will be owned by the router and thus will not be lost even when the actor dies. – sarveshseri Jan 10 '23 at 08:13
  • Stash only captures the message which are sent during the time when the actor is getting re-started... it does not preserve any messages which were already delivered to the actor. – sarveshseri Jan 10 '23 at 08:58
  • @sarveshseri I read in the [docs](https://doc.akka.io/docs/akka/2.7.0/general/supervision.html#what-happens-to-the-mailbox) that for supervised actors "If an exception is thrown while a message is being processed, nothing happens to the mailbox. If the actor is restarted, the same mailbox will be there. So all messages on that mailbox will be there as well." So shouldn't the messages on a mailbox still be there after the restart? – havenwang Jan 10 '23 at 16:54
  • I must add, I observe lost messages only when using the `BackoffSupervisorStrategy` and **NOT** when using `RestartSupervisorStrategy` – havenwang Jan 10 '23 at 16:55

2 Answers2

1

You can embed the Actor inside a Router, so that the mailbox life-cycle is detached from the actor lifecycle, and the messages are not lost when the actor gets re-started.

object PrintActor {

  sealed trait Message

  final case class PrintMessage(value: String) extends Message
  final case class FailMessage(value: String) extends Message

  def apply(): Behavior[Message] =
    Behaviors.receive { (context, message) =>
      message match {
        case PrintMessage(value) =>
          println(s"Print Message :: ${value}")
          Behaviors.same
        case FailMessage(value) =>
          println(s"Fail Message :: ${value}")
          throw new RuntimeException(value)
      }

    }
}
object Main {

  def createActors(): Behavior[Unit] =
    Behaviors.setup[Unit] { ctx =>
      val pool = Routers.pool(poolSize = 1) {
        // make sure the workers are restarted if they fail
        Behaviors.supervise(PrintActor()).onFailure[Exception](SupervisorStrategy.restart)
      }

      val router = ctx.spawn(pool, "printer-pool")

      (1 to 10).foreach { n =>
        if (n % 2 == 1)
          router ! PrintActor.PrintMessage(s"Print $n")
        else
          router ! PrintActor.FailMessage(s"Fail $n")
      }

      Behaviors.empty
    }

  def main(args: Array[String]): Unit = {
    val system = ActorSystem.apply[Unit](createActors(), "system")
  }
}

This pattern will ensure that no message are lost.

Print Message :: Print 1
Fail Message :: Fail 2
Print Message :: Print 3
Fail Message :: Fail 4
Print Message :: Print 5
Fail Message :: Fail 6
Print Message :: Print 7
Fail Message :: Fail 8
Print Message :: Print 9
Fail Message :: Fail 10
sarveshseri
  • 13,738
  • 28
  • 47
  • Really appreciate you typing out an example. I took this and modified the supervisor strategy ```SupervisorStrategy.restart``` to instead be ```SupervisorStrategy.restartWithBackoff(Duration.ofMillis(10), Duration.ofMillis(1000), 0)``` and on doing so, the output after running is ```Print Message :: Print 1 Fail Message :: Fail 2 Print Message :: Print 3 Fail Message :: Fail 4 ``` – havenwang Jan 10 '23 at 17:08
  • I mention this in an above comment as well, but it seems like there may be some behavior difference between `BackoffSupervisorStrategy` and `RestartSupervisorStrategy`, wondering if this is expected? – havenwang Jan 10 '23 at 17:13
  • I found akka [docs](https://doc.akka.io/docs/akka/current/fault-tolerance.html#supervision-strategies) regarding backoff that say: "this supervision strategy does not restart the actor but rather stops and starts it", which may explain the difference – havenwang Jan 10 '23 at 18:48
  • @havenwang that is because the restartWithBackOff streategy delays the restart for a certaint backoff interval. And any messages received during this time are just dropped. The messages after 4th one are conciding with the backoff time and are thus ignored. – sarveshseri Jan 10 '23 at 22:30
  • It appears that the tells are all sent by the time the first message is handled. If I print a message right after the tells are sent, the output I see is: ```Done sending tells - Print Message :: Print 1 - Fail Message :: Fail 2 - Print Message :: Print 3 - Fail Message :: Fail 4 ``` – havenwang Jan 10 '23 at 22:35
  • I did some digging into the internals of akka's supervisor when using the backoff strategy. What I found is that during the restart, the mailbox messages are all moved to the supervisor's internal stash. In addition, this stash is re-initialized during each restart. This means any messages not processed from stash are lost – havenwang Jan 10 '23 at 22:37
0

Not 100% sure but I think there may be a bug in supervision for failures happening while unstashing the internal stash of backoff.

I've created an issue in the Akka tracker for further investigations: https://github.com/akka/akka/issues/31814

johanandren
  • 11,249
  • 1
  • 25
  • 30