13

I have multiple distributed competing consumers each pulling messages off the same (transactional) queue. I want to implement each consumer as an Idempotent Receiver so I never process the same message more than once (across all consumers) even if a duplicate arrives. How can I accomplish this with multiple consumers?

My first thought is to somehow generate a consecutive sequence number for each message before putting them on the queue and then use a shared database table to coordinate the work between consumers. I.e. consumer#1 processes msg#1 and then writes a row to DB table saying 'msg#1 is processed' (want it in a database to ensure durability). When a consumer is ready to process a message, it peeks at the next one available in the queue, consults the shared DB table and determines if this is the next msg in order. If so, it pulls it off the queue. If not, it ignores it.

In this way, I only need to store the last message processed (as there is a consecutive sequence number for all msgs), I don't need to use a buffer storing IDs of all messages received with a negotiated 'window' size, and the messages are always processed serially (which is what I want for this scenario).

Just curious if there is a better way? I'm concerned about the cost of querying the database whenever I need to process a message.

If the answer is "it depends on the framework", then I had MSMQ in mind

emertechie
  • 3,607
  • 2
  • 22
  • 22

3 Answers3

10

I've accomplished idempotent messages by ensuring each message has a GUID or other unique identifier and then recording it in the same transaction as which you alter the state in your persistence store.

For each message you can now check if the unique id exists in your persistence store.

If the unique id exists, you know it was processed previously and state changes were persisted in the same transaction.

If the unique id does not exist, you know that it has never been processed.

If two consumers process the same message, because your table where you store your processed unique id's has a unique constraint, when it comes time for both consumers to commit their transactions, one of them must fail and rollback all of it's changes while the other will succeed.

nrjohnstone
  • 778
  • 10
  • 17
2

The point of idempotent receiver is that is does not matter if a message is processed several times. Hence, idempotent receivers don't need to somehow detect that a message is a duplicate, they can simply process it as usual ...

So either your receiver is not idempotent, or you are worrying needlessly ...

BenMorel
  • 34,448
  • 50
  • 182
  • 322
meriton
  • 68,356
  • 14
  • 108
  • 175
  • 1
    If my API / message processing logic is designed in a way to be idempotent then yes, I don't need to worry about receiving duplicate msgs. This is not the case for my scenario. I need to filter out duplicate messages and not just for one instance of a consumer but across multiple instances. – emertechie Nov 03 '09 at 20:34
  • But what if I the only way I have to guarantee idempotency is by avoiding the side effects to happen a second time? – Edwin Dalorzo Feb 21 '17 at 13:56
  • 1
    Being idempotent is an implementation detail, it is not magic. You can either do this by making sure your 'messages' are idempotent, such as set value X to 5 OR by recording the fact that a message has been processed. The best way to accomplish the latter is to have each message contain a GUID or other unique ID and record it in the same transaction as which you alter the state in your persistence store. For each message you can then check if the GUID has been processed already, knowing that it was in the same transaction as the state changes. – nrjohnstone May 11 '18 at 21:52
  • This makes no sense. This answer implies that every piece of code that is execute when a message is received must be idempotent. That's surreal. – Marcelo De Zen Apr 26 '20 at 07:13
  • This answer should be unchecked as correct one. E.g. message received is "SendWelcomeEmail" - you have to keep either track of if that message is already processed, or flag it in some other way. – Maciej Pszczolinski Aug 02 '20 at 07:59
-2

Andrew -

Another option is to look at how your queue handles messages. There are queues which remove messages after they have been picked up by a consumer. This is typical behavior for a queue and it shouldn't be difficult to find a queue with this type of functionality. This should provide you a simple solution instead of the building a way for each consumer to ensure they do not receive a message which has already been processed by another consumer.

Best, Henry

henry74
  • 1,015
  • 1
  • 10
  • 14
  • 2
    This still doesn't address failure modes where a message has been 'removed', the message has been processed and state persisted to the database but just as the process acks the message it fails, meaning that the message will be placed back on the queue and reprocessed again. – nrjohnstone May 11 '18 at 21:57
  • Typically a message is not removed, just made unavailable to be picked up until the queue framework ACKs that it's been successfully processed. If the queue never receives the ACK, the processing timeout will be hit and then message will be put back on the queue assuming it was configured this way. Your use case is a well known and well solved situation. – henry74 Sep 17 '18 at 21:24