The usual way of implementing the outbox pattern is to store the message payload in an outbox table and have a separate process (the Message Relay) query for pending messages and publish them into a message broker, Kafka in my case.
The state of the outbox table could be as shown below.
OUTBOX TABLE
---------------------------------
|ID | STATE | TOPIC | PAYLOAD |
---------------------------------
| 1 | PROCESSED | user |
| 2 | PENDING | user |
| 3 | PENDING | billing |
----------------------------------
My Message Relay is a Spring Boot/Cloud Stream application that periodically (@Scheduled
) looks for PENDING records, publishes them into Kafka and updates the record to a PROCESSED state.
The first problem is: if I start multiple instances of the Message Relay all of them would query the Outbox table, and possibly at some point different instances would get the same PENDING registries to publish into Kafka, generating duplicated messages. How can I prevent this?
Another situation: supposing only one Message Relay. It gets one PENDING record, publishes it to the topic but crashes before updating the record to PROCESSED. When it starts up again it would find the same PENDING record and publish it again. Is there a way to avoid this duplication or the only way is to design an idempotent system.