3

I want to create an Akka Cluster with 1000s of actors. Each actor receives a message, does some calculations and writes the result into a dedicated Kafka topic.

It should be deployed in a cluster, for example Kubernetes.

My understanding is that - if what whatever reason the actor is terminated (JVM crash, redeployment or anything else), then the content of its mailbox - along with the message being currently processed - is lost!

This is completely unacceptable in my case, hence I want to implement a way to have persistent mailboxes. Note that the actors themselves are stateless, they don't need to replay messages or reconstruct a state. All I need is to not lose messages if the actor is terminated.

The question is: what's the recommended way to do this? Here and here they recommend to implement persistent actors. But like I said, I don't need to persist and recover any state of the actor. Should I implement a custom mailbox that's based on a persistent storage (like a SQL database)?

I also saw that before some version Akka supported "durable" mailboxes, which seems to be what I need. But for some reason they removed it, which is confusing...

Archie
  • 962
  • 2
  • 9
  • 20

2 Answers2

6

Using persistent actors on the client is what is the recommendation for requirements like this. I understand that you are saying that your receiving actor doesn't need persistence/statefulness, but by using persistence on the client you either retry if the receiving actor is terminated or use the out of the box guaranteed message delivery feature to make sure it's processed. Essentially persistence is being used (at the client side) to persist the requests made so that the clients can resend the messages to "rebuild the mailbox" if necessary.

Using client side persistence is:

  • More performant than persistent mailboxes
  • Protects against more failure scenarios (like messages dropped at the network layer, failures in application logic)
  • Is more flexible and support more types of recovery (ex: scenarios where only some messages need to be recovered)

That was why persistent mailboxes were dropped from Akka: Akka Persistence/Guaranteed At Least Once delivery was essentially a better solution than persistent mailboxes in all ways.

stikkos's answer to use Kafka is viable too. I just worry that introducing Kafka adds a lot of complexity. Of course, any persistence store adds complexity, so I guess it just depends on what you already have in place.

David Ogren
  • 4,396
  • 1
  • 19
  • 29
3

You can use Kafka to achieve what you want. The Kafka topics are persistent (If you set retention in Kafka to forever or enable log compaction on a topic, then data will be kept 'for all time' or you can store offsets outside of Kafka).

Using Akka Streams, you would commit the message you received (on the receiving topic) after you broadcast the message(s) you produce (on the producing topic), giving you "at-least-once” delivery semantics. (for "exactly-once", you can look into Kafka Transactions)

This is the example from the Alpakka Kafka docs:

Consumer.DrainingControl<Done> control =
    Consumer.committableSource(consumerSettings, Subscriptions.topics(topic))
        .map(
            msg ->
                ProducerMessage.single(
                    new ProducerRecord<>(targetTopic, msg.record().key(), msg.record().value()),
                    msg.committableOffset() // the passThrough
                    ))
        .via(Producer.flexiFlow(producerSettings))
        .map(m -> m.passThrough())
        .toMat(Committer.sink(committerSettings), Keep.both())
        .mapMaterializedValue(Consumer::createDrainingControl)
        .run(materializer);

You can integrate this with a (pool of clustered) Actors in a few ways. The easiest would be to use the Ask pattern. In that case, the stream will pass the message to the actor (could be self()) who has to reply within a predefined time. When a reply is received it will be broadcasted on the target stream before committing the original message.

This would look something like:

Consumer.DrainingControl<Done> control =
    Consumer.committableSource(consumerSettings, Subscriptions.topics(topic))
            .mapAsync(1, msg -> 
                Patterns.ask(actor, msg, Duration.ofSeconds(5))
                    .thenApply(done ->
                        ProducerMessage.single(
                                new ProducerRecord<>(targetTopic, done.key(), done.value()),
                                msg.committableOffset() // the passThrough
                        )
                    )
            )
            .via(Producer.flexiFlow(producerSettings))
            .map(m -> m.passThrough())
            .toMat(Committer.sink(committerSettings), Keep.both())
            .mapMaterializedValue(Consumer::createDrainingControl)
            .run(materializer);

You can also increase the parallelism factor for the mapAsync invocation if you have multiple actors that can handle messages at the same time.

stikkos
  • 1,916
  • 2
  • 19
  • 34