Following is my use case
- Bunch of applications enqueue messages in Kafka under different topics.
- Have consumer of each topic distribute the work to a worker in a cluster. The work can be classified as long running, memory intensive, simple etc and the worker is chosen accordingly.
This has me exploring Akka cluster for work distribution, routing and scaling. I can use Akka "Supervisor" as a Kafka consumer and assign incoming work to the appropriate worker based on its classification.
But what I am still trying to understand is the correct way to implement a resilient way of communication between the supervisor and workers in the Akka cluster. Because as soon as the supervisor consumes the message from Kafka, the Kafka offset is committed. If some error happens in processing after the offset commit, is the following acceptable way to recover and start from where it was last left?
Make the supervisor a persistent actor by using durable mailbox backed by Kafka. Supervisor enqueues work in Kafka and worker gets its work from Kafka and commits its offset only after completing the work.