1

I was writing a Azure function with service bus trigger while I noticed that if there is any exception occurred, after certain times of built-in retries, it will put the incoming message into the dead letter queue (or called poison-queue? in microsoft document, however, I am not able to get or add any additional information from the DLQ. For example, I am trying to convert 1000 record but 1 of the record convert failed due to invalid data type, then the exception throws and my function ends its work, what the exception looks like, why and when it generated or what would my data looks like when exception happened were all not be able to capture in DLQ message.

I tried to customized the exception object to add more information but seems like it won't affect the way that message sent to the dead-letter-queue. Is that how dead-letter-queue designed that it would only move the message from standard queue to DLQ when exception happened?

What would be the best way that I can handle my exception within the azure function? According to Microsoft we recommend that you use try-catch blocks in your function code to catch any errors., should I put my exception in another queue and let another service to take a look or handle from the other side?

Drex
  • 3,346
  • 9
  • 33
  • 58

1 Answers1

2

We've run into this problem as well. We solved it by putting our message in an envelope that had a DequeueCount and an array of Exceptions. When a message is dequeued, we increment the dequeue count within the Azure Function. If an exception occurs, we do the following:

  1. In the catch block, we increment the dequeue count and add the full exception to the exceptions array.
  2. Increment the dequeue count and explicitly add the message back to the queue.

This way, the message carries the dequeue count and a history of any exceptions that've occurred.

When we dequeue a message, we check the dequeue count. If it's above our threshold, we explicitly deadletter the message.

This way, the deadlettered message will contain the full exception history. You can follow this pattern to save any logging info you want to the message so that you can see it all if it ends up being deadlettered.

Rob Reagan
  • 7,313
  • 3
  • 20
  • 49
  • 1
    Another option is to use app insights to correlate the exceptions thrown with the poisoned queue message. May require you emit some custom metrics or telemetry to assist but that way you don’t have to necessarily correlate within the queue message itself. – jeffhollan Sep 14 '19 at 20:50
  • @Rob Reagan, thank you very much for sharing your idea! Just would like to get more in detail with my understanding, initially your input message would have a property of dequeueCount of 0 and a empty list of exception, and function would have a max count of retry, if anything exception happened within one execution round, will increase the dequeueCount and add current exception into exception list, and next round will check the current dequeueCount with max retry number, is that how it works? Just curious that how would you explicitly deadletter message if exceed the max retry count manually? – Drex Sep 16 '19 at 14:46
  • @Rob Reagan, in this case, only message that dequeueCount exceed the max retry count will be considered as dead letter message, and will be inserted into DLQ, correct? If the input message had some exceptions but during 3rd time of auto-retry, it goes through the function, then we don't need worry about what happened or exception generated in first and second time previously, right? – Drex Sep 16 '19 at 14:48
  • And what kind of exception info should we capture and put in the exception list? For example, if we put stackTrace or something else, would the message become too large after certain times of retry? As I remembered there is a limitation of 2 MB per message size in service bus fqueue – Drex Sep 16 '19 at 14:51
  • @KevDing You are right on both of your previous comments. As far as what information to capture, I'd start with the exception message and stack trace. And you are right to be mindful of the max message size. But I do believe that the max message size is at least a MB. If you are worried about exceeding the max message size, you could serialize the exception message and stack trace somewhere else and store lookup info. But I would check the service limit for service bus message size and see if that is an issue before resorting to more time consuming options. – Rob Reagan Sep 16 '19 at 15:14
  • @RobReagan, Thanks for the advice! I was trying the similar way but encounter a stuck point that in my catch block, I increased the count and add the exception to my message's list, then throw my exception to **trigger the requeue**. However, everytime my requeue message didn't contain the updated information (count or list), it turns out always to be the initial message that came first time. So how did you requeue the message in this case, by throwing the exception or manually resend to the queue? If it's the latter case, i guess we had to change the default retry count to be 1, right? – Drex Sep 16 '19 at 17:35
  • btw, I was following this [design](https://blog.kloud.com.au/2017/05/22/message-retry-patterns-in-azure-functions/) while requeue the message by throw the exception, however, things might change recently that BrokeredMessage was no longer used. I was able to requeue the message by doing this way however i couldn't update my message by attaching my exception to it. When you mean explicit requeue the message, actually the function was only executed one time per msg, not retry within same instance of the initial function, every _retry_ would be actually a new instance of the function, correct? – Drex Sep 16 '19 at 17:54
  • @KevDing, to make this design work, you do not throw an exception to retry the message. Instead, when a message is dequeued, you mark it as dequeued. in your catch block, add your exception and then explicitly add the NEW message to your service bus queue. As an added bonus, you can schedule the message for a few seconds into the future in the case of some kind of transient error. Sorry for the delay in answering - things are busy here at the office. If I can get a break in the storm, I'll see if I can put together an example repo. But no promises on if I can get to it in a timely fashion. – Rob Reagan Sep 17 '19 at 13:16
  • @RobReagan, thank you so much for explaining the details! I've understood the process and it'll be totally fine if you don't have time for the code example. Yes that'll be the same thing as I did, basically I am put my initial message along with updated retryCount and exceptionList as a new message, and send to my current queue, then that means, it'd be a new function instance which would handle my updated message instead of the existing one using the _retry_ logic. – Drex Sep 17 '19 at 15:00
  • @KevDing Sounds like we're on the same page. You are right in your last message. – Rob Reagan Sep 17 '19 at 15:16