1

I have an Amazon SQS queue and a dead letter queue.

My python program gets a message from the SQS queue and then, if it raise an exception, it will send the message to the dead letter queue.

Now I have a program that checks dead letter queue if those messages can still be processed. If it is, it will be sent back to main SQS queue. You see, what I expect here is an infinite loop of sorts in my testing but apparently, the message disappears after 2 tries. Why is it like this?

When I put an extra field in the message (which is random value) it somehow does what I expect (infinite loop of sending back and forth). Is there a mechanism in SQS that prevents what I do when message is the same?

def handle_retrieved_messages(self):
    if not self._messages:
        return None

    for message in self._messages:
        try:
            logger.info(
                "Processing Dead Letter message: {}".format(
                    message.get("Body")
                )
            )
            message_body = self._convert_json_to_dict(message.get("Body"))
            reprocessed = self._process_message(
                message_body, None, message_body
            )
        except Exception as e:
            logger.exception(
                "Failed to process the following SQS message:\n"
                "Message Body: {}\n"
                "Error: {}".format(message.get("Body", "<empty body>"), e)
            )
            # Send to error queue
            self._delete_message(message)
            self._sqs_sender.send_message(message_body)
        else:
            self._delete_message(message)
            if not reprocessed:
                # Send to error queue
                self._sqs_sender.send_message(message_body)

self._process_message will check if message_body has reprocess flag set to true. If true, send it back to main queue.

Now I made the contents of the message with error so every time it is processed in Main queue, it will go to dead letter. And then I expect this to keep on loop but SQS looks like has a mechanism to stop this from happening (which is good).

Question is what setting is that?

Woootiness
  • 1,882
  • 2
  • 15
  • 18
  • Have you activated the `Use Redrive Policy` option on the Amazon SQS queue? If so, what is the `Maximum Receives` value set to? Can you edit the question to show us any of your code? – John Rotenstein May 23 '19 at 09:15
  • @JohnRotenstein Maximum Receives is set to 10 – Woootiness May 23 '19 at 10:09
  • @Woootiness Do you think this is a right approach? Since Dead-Letter Queue is used to analyze the messages which got failed due to some reason. Fetching the message back from the dead queue and try is process again is not a good idea. – Shivang Agarwal May 23 '19 at 10:30
  • @Woootiness This is the purpose of the dead letter queue: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html#sqs-dead-letter-queues-benefits – Shivang Agarwal May 23 '19 at 10:33
  • @ShivangAgarwal what we did was list certain exceptions that will determine if needs to be reprocessed. If it's not part of those exceptions, we send it to a third queue which is the "real" dead letter queue for manual checking – Woootiness May 23 '19 at 10:36
  • @Woootiness Don’t use a dead-letter queue with standard queues when you want to be able to keep retrying the transmission of a message indefinitely. For example, don’t use a dead-letter queue if your program must wait for a dependent process to become active or available. - from amazon document – Shivang Agarwal May 23 '19 at 10:37
  • @Woootiness In that case rather than having the separate queue you should not delete that message from the queue and it will get picked again after visibility time is finished. It will reduce your cost and simplify your process as well. – Shivang Agarwal May 23 '19 at 10:39
  • @ShivangAgarwal the invisble message is different to the ones on inflight? right? – Woootiness May 23 '19 at 10:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/193823/discussion-between-shivang-agarwal-and-woootiness). – Shivang Agarwal May 23 '19 at 10:47
  • 1
    @Woootiness Whenever you poll any message from SQS then SQS mark visibility time for that message so that it will not get poll again until a specified time. So if you don't delete any message in case of an error then it will be available again after visibility time is done. – Shivang Agarwal May 23 '19 at 10:52

1 Answers1

10

The normal way that an Amazon SQS queue works is:

  • Messages are sent to the queue
  • An application calls ReceiveMessage() on the queue to receive a message (or multiple messages). This increments the Receive Count on a message.
  • This puts the message(s) into an invisible state. This means that the message is still in the queue, but it is not visible if another application tries to receive messages from the queue
  • Once the application has finished processing the message, it calls DeleteMessage(), providing the message handle of the message. This removes the message from the queue.
  • However, if the application does not delete the message within the invisibility timeout period, then the message appears on the queue again. This is done in case the application has crashed. Instead of losing the message, it is put back on the queue so that another (or the same) application can process it again.
  • If a message exceeds the invisibility timeout period AND its Receive Count now exceeds the Maximum Receives setting, it is not put back on the queue. Instead, it is placed on the Dead Letter Queue (DLQ).

So, the normal process is that Amazon SQS moves messages to the DLQ after the message has been received more than (in your case) 10 attempted Receives. It is NOT the job of your application to move the message to the Dead Letter Queue!

If you want to handle all the 'dead letter' processing yourself (eg moving to different queues), then turn off the DLQ functionality on the queue itself. This is probably causing your messages to disappear or go to the wrong location.

By the way, when deleting a message, you need to provide the MessageHandle of the message, not the message itself.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • (for deleting) that is just a wrapper function, we extract the message handle – Woootiness May 23 '19 at 10:38
  • On the Exception part of my try catch, if I don't explicitly delete, the message stays at in-flight. And we can't process in flight so I have to delete it and move it – Woootiness May 23 '19 at 10:41
  • If you want to do all the "delete and move" stuff yourself, then you should turn off DLQ functionality in the queue. You said that you have Maximum Receives set to 10, but you can only provide that value if DLQ is turned ON. So, you need to turn it off. – John Rotenstein May 23 '19 at 10:43
  • I see, thanks for the insight! I'm quite new to AWS – Woootiness May 23 '19 at 10:47