4

I'm looking at possibly implementing a dead-letter queue for a queue being used by a set of services to communicate with each other.

Something that's been lingering at the back of my head is how it solves the problem of un-processed messages.

The way I see it, a message will not be processed in one of two scenarios:

  1. The message itself cannot be processed for some reason intrinsic to the message (like it's improperly formatted).
  2. The application receiving the message has no capacity to process it.

In scenario 1, holding the message in the dead-letter queue as-is does nothing. The application still can't process it.

In scenario 2, the application would somehow need to also process messages from the dead-letter queue. But, if it doesn't have the capacity to process messages from its main queue, why would it have capacity to pick up work from a second queue?

There must be something I'm missing.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Hugo
  • 2,186
  • 8
  • 28
  • 44

1 Answers1

7

The goal of a dead-letter queue isn't to treat it as a secondary queue that the same subscriber reads from. Instead, it is a place to send messages that unexpectedly can't be processed. There would be two main reasons for this scenario:

  1. There is an error in the publish message (Your scenario #1)
  2. The behavior of the subscriber has changed and now it can't process a message it used to be able to process.

Ideally, when one sets up a dead-letter queue, they also set up alerting based on messages having been published to this queue. Then, in some separate process, they look at the messages by subscribing to a subscription on the dead-letter topic to determine why they couldn't be processed. If the messages were not correct, then the owner of the subscriber can reach out to the owner of the publisher to fix the messages if that is required.

If the messages are correct and a bug in the subscriber prevents them from being processed, then one can fix the subscriber. Once fixed, the messages could be republished to the original topic so that they can be picked up by the fixed subscriber or one can use seek to replay the messages.

Messages getting passed to the dead-letter queue due to lack of capacity to process them (and therefore is nacking them or letting the ack deadlines expire) is similar to #2 and would mean that one needs to increase the capacity of the subscribers and likely set flow control to levels in line with what the subscriber can process. Then, one would republish or use seek to get the message back again.

Kamal Aboul-Hosn
  • 15,111
  • 1
  • 34
  • 46
  • Given that the messages are supposed to just sit in a DLQ and not move until further (human) inspection, why wouldn't one keep it simple and just store messages in a database instead of setting up a dedicated message queue for it? – Maurits Moeys Jun 09 '23 at 16:28
  • There are a couple of reasons: 1. You may not want to move it to the database after the first failure if it is possibly some transient issue. In that case, you'd want to count the number of failures. However, this would require the use of a database itself as you can't be sure redeliveries will come to the same subscriber client. 2. If a message is malformed in some way and causes your subscriber to crash, you wouldn't have the opportunity to write it to a database. – Kamal Aboul-Hosn Jun 09 '23 at 20:12
  • Thanks for your reply! I'm a little confused still. 1 => typically a message would be retried several times by the regular queue before declared as "poison"/"dead" and moved over to the DLQ, so a DLQ isn't for retrying transient errors? 2 => Whatever fallback logic one writes that is capable of putting a malformed, crash-inducing message into a DLQ, that logic should equally well be able to store it into a DB? – Maurits Moeys Jun 10 '23 at 07:58
  • You are of course welcome to use a database. But many people like the simplicity that DLQ brings in protecting their subscribers, both in terms of transient and permanent failures. – Kamal Aboul-Hosn Jun 12 '23 at 10:23