2

I have an app that uses SQS to queue jobs. Ideally I want every job to be completed, but some are going to fail. Sometimes re-running them will work, and sometimes they will just keep failing until the retention period is reached. . I want to keep failing jobs in the queue as long as possible, to give them the maximum possible chance of success, so I don't want to set a maxReceiveCount. But I do want to detect when a job reaches the MessageRetentionPeriod limit, as I need to send an alert when a job fails completely. Currently I have the max retention at 14 days, but some jobs will still not be completed by then.

Is there a way to detect when a job is about to expire, and from there send it to a deadletter queue for additional processing?

Anthony Neace
  • 25,013
  • 7
  • 114
  • 129
Trevor
  • 995
  • 3
  • 10
  • 25

1 Answers1

2

Before you follow my advice below and assuming I've done the math for periods correctly, you will be better off enabling a redrive policy on the queue if you check for messages less often than every 20 minutes and 9 seconds.

SQS's "redrive policy" allows you to migrates messages to a dead letter queue after a threshold number of receives. The maximum receives that AWS allows for this is 1000, and over 14 days that works out to about 20 minutes per receive. (For simplicity, that is assuming that your job never misses an attempt to read queue messages. You can tweak the numbers to build in a tolerance for failure.)

If you check more often than that, you'll want to implement the solution below.


You can check for this "cutoff date" (when the job is about to expire) as you process the messages, and send messages to the deadletter queue if they've passed the time when you've given up on them.

Pseudocode to add to your current routine:

  • Call GetQueueAttributes to get the count, in seconds, of your queue's Message Retention Period.
  • Call ReceiveMessage to pull messages off of the queue. Make sure to explicitly request that the SentTimestamp is visible.
  • Foreach message,
    • Find your message's expiration time by adding the message retention period to the sent timestamp.
    • Create your cutoff date by subtracting your desired amount of time from the message's expiration time.
    • Compare the cutoff date with the current time. If the cutoff date has passed:
      • Call SendMessage to send your message to the Dead Letter queue.
      • Call DeleteMessage to remove your message from the queue you are processing.
    • If the cutoff date has not passed:
      • Process the job as normal.

Here's an example implementation in Powershell:

$queueUrl = "https://sqs.amazonaws.com/0000/my-queue"
$deadLetterQueueUrl = "https://sqs.amazonaws.com/0000/deadletter"

# Get the message retention period in seconds
$messageRetentionPeriod = (Get-SQSQueueAttribute -AttributeNames "MessageRetentionPeriod" -QueueUrl $queueUrl).Attributes.MessageRetentionPeriod

# Receive messages from our queue.  
$queueMessages = @(receive-sqsmessage -QueueUrl $queueUrl -WaitTimeSeconds 5 -AttributeNames SentTimestamp)

foreach($message in $queueMessages)
{
    # The sent timestamp is in epoch time.
    $sentTimestampUnix = $message.Attributes.SentTimestamp

    # For powershell, we need to do some quick conversion to get a DateTime.
    $sentTimestamp = ([datetime]'1970-01-01 00:00:00').AddMilliseconds($sentTimestampUnix)

    # Get the expiration time by adding the retention period to the sent time.
    $expirationTime = $sentTimestamp.AddDays($messageRetentionPeriod / 86400 )

    # I want my cutoff date to be one hour before the expiration time.
    $cutoffDate = $expirationTime.AddHours(-1)

    # Check if the cutoff date has passed.
    if((Get-Date) -ge $cutoffDate)
    {
        # Cutoff Date has passed, move to deadletter queue

        Send-SQSMessage -QueueUrl $deadLetterQueueUrl -MessageBody $message.Body

        remove-sqsmessage -QueueUrl $queueUrl -ReceiptHandle $message.ReceiptHandle -Force
    }
    else
    {
        # Cutoff Date has not passed. Retry job?
    }
}

This will add some overhead to every message you process. This also assumes that your message handler will receive the message inbetween the cutoff time and the expiration time. Make sure that your application is polling often enough to receive the message.

Anthony Neace
  • 25,013
  • 7
  • 114
  • 129
  • this makes sense, but your last paragraph about sums up my problem. The workload of this app is very bursty, I may end up with 10,000 jobs in the queue, all of which take multiple attempts for success, or I will have only a few jobs in the queue. I can't set just one 'give up time' and then fail the message no matter what. It may be too soon or too late. I was hoping for something that more reliably captures jobs when they are about to be deleted, maybe an SQS feature? – Trevor Feb 25 '15 at 20:51
  • No, there's no feature in SQS for this. The MessageRetentionPeriod is an attribute of the queue and isn't really defined at a per-message level such that you can rescue these messages. It sounds like you'll need to have an additional job running that: for some interval t, checks every message in your queue, selects messages that will be deleted before this polling job runs again, and destroys/recreates these messages with the exact same content. This allows a workaround to the retention period, and allows the _content_ of your messages to stay in the queue indefinitely. – Anthony Neace Feb 25 '15 at 22:43