0

I have a time triggered Azure function that runs every 2 minutes and executes code that fetches all messages from an Azure service bus topic, and then sends the content of some of the messages as new messages on the same topic.

The problem I experience is that sometimes the messages end up in the dead letter queue after some time, even though the azure function doesn't fail or logs any exceptions. It looks like the messages that end up in DLQ has been moved there because they were locked and the lock timed out, and since Max delivery count is set to 1 on the topic, the message is sent to DLQ. But my code doesn't lock the messages without completing them (unless an exception is thrown, which doesn't happen). Furthermore the locking seems to happen at a point in time where the azure function isn't running. Sometimes it happens after a few "resends" of the messages, at other times it happens after many.

So how does the messages get locked if they where never received by my code? Is there any known issues with TopicClient.SendAsync() that could explain this behaviour? Or is there another reason that the messages ends up in DLQ, when the code doesn't log any exceptions?

I have included a code example, that shows the problem (the code is useless, it is just to narrow down where the problem is):

[FunctionName("TestResend")]
public static async Task TestResendAsync([TimerTrigger("0 */2 * * * *")] TimerInfo myTimer, ILogger log, ExecutionContext context)
{
     try
     {
           log.LogInformation("Testing resend of messages");

           using var container = GetContainer(context);
           var resender = container.GetInstance<TestResender>();
           await resender.Test(new TraceWriterLogger(log));
     }
     catch (Exception ex)
     {
         log.LogError(ex, "Exception in TestResend");
         throw;
     }
}

public class TestResender
{
        private static MessageReceiver _messageReceiver;
        private static TopicClient _topicClient;

        public TestResender()
        {
            if (_topicClient == null || _topicClient.IsClosedOrClosing)
            {
                _topicClient = new TopicClient(ConnectionStrings.ServiceBusConnectionString, "plandisc.domain.models.servicebus.testresendmessage");
            }
        }

        public async Task Test(ILogger logger)
        {
            _messageReceiver = new MessageReceiver(ConnectionStrings.ServiceBusConnectionString, 
                EntityNameHelper.FormatSubscriptionPath("plandisc.domain.models.servicebus.testresendmessage", "testresendmessage-subscription"), ReceiveMode.PeekLock, null, 0);

            logger.Info("Receiving Messages");
            var messages = new List<Message>();
            IList<Message> messageBatch;

            do
            {
                messageBatch = await _messageReceiver.ReceiveAsync(int.MaxValue, TimeSpan.FromSeconds(2));
                if (messageBatch != null && messageBatch.Any()) messages.AddRange(messageBatch);
                logger.Info("ReceiveAsync from " + _messageReceiver.Path + " returned: " + (messageBatch?.Count() ?? 0));
            } while (messageBatch != null && messageBatch.Any());

            var serviceBusMessages = messages.Select(m => Json.Deserialize<TestResendMessage>(System.Text.Encoding.UTF8.GetString(m.Body)));

            foreach (var message in serviceBusMessages)
            {
                logger.Info("Resending message: " + message.Data + " " + message.TimeStamp);
                await SendAsync(new TestResendMessage
                {
                    Data = message.Data,
                    TimeStamp = message.TimeStamp
                }, logger);
            }

            logger.Info($"Complete messages (Messages: {messages.Count})");

            var lockTokens = messages
                .Where(m => m.SystemProperties.LockedUntilUtc > DateTime.UtcNow)
                .Select(m => m.SystemProperties.LockToken);

            if (messages.Any())
            {
                var info = "messages: " + messages.Count() + "\n" +
                    "lockTokens: " + lockTokens.Count() + "\n" +
                    "lockedUntilUtc: " + string.Join(", ", messages.Select(m => m.SystemProperties.LockedUntilUtc.ToString()));
                logger.Info(info);
            }

            if (lockTokens.Any())
            {
                await _messageReceiver.CompleteAsync(lockTokens);
            }

            logger.Info("Done completing messages");
        }

        public async Task SendAsync(TestResendMessage message, ILogger logger)
        {
            var envelope = new Message();
            envelope.Body = Encoding.UTF8.GetBytes(Json.Serialize(message));

            try
            {
                await _topicClient.SendAsync(envelope);
            }
            catch (Exception ex)
            {
                logger.Error("Can't send service bus message", ex);
                throw;
            }
        }
}
jbiversen
  • 371
  • 3
  • 14

1 Answers1

0

By default, MessageReceiver doesn't prefetch any messages. And if it would, the code is too simple to cause locks to be lost while processing a batch as a result of a lost lock. Saying that,

There are several potential things to address with the design of the function:

  1. Max delivery count of 1 - this is going to dead-letter messages on every error that could be retried. Including locking issues.
  2. Execute using Service Bus trigger instead of timer, receiving a batch of messages.
Sean Feldman
  • 23,443
  • 7
  • 55
  • 80
  • Thanks for your reply! Should I understand the first part of your answer as indicating that there might be an issue with the Service bus SDK (or the Service bus)? – jbiversen Aug 08 '22 at 08:28
  • I doubt that. Are you using the latest ASB SDK with your function? It looks like it's the previous SDK. – Sean Feldman Aug 08 '22 at 19:41
  • I am using Microsoft.Azure.ServiceBus 4.1.3 nuget package, which I can see is deprecated. I will try switching to a newer version. – jbiversen Aug 09 '22 at 10:53
  • I have tried updating the nuget package now, and it seems that the problem is still there. – jbiversen Oct 03 '22 at 08:12
  • This approach feels off. You're receiving from a DLQ of the subscription where you resubmit the failed messages. And the entity is configured with a single processing attempt, meaning messages failing processing will be moved right into DLQ you're running a `while` loop to receive from the entity. That's just asking for trouble IMO because the processing could be happening in parallel while your timer-triggered function is not done yet, which will cause it to run beyond the two minutes (`0 */2 * * * *"`). And then another kickoff of the timer will take place, consuming in-flight DLQed messages – Sean Feldman Oct 03 '22 at 16:26
  • As far as I can see my code doesn't receive messages from the DLQ. Running my code shows that messages ends up in the DLQ, even though no exception is logged. As far as I know time triggered azure functions don't run in parallel (https://stackoverflow.com/questions/49282060/azure-functions-timer-trigger-thread-safety). Or do I misunderstand your message? – jbiversen Oct 04 '22 at 21:00
  • In that case, sorry, I misunderstood your question. Is there any chance another process consuming those messages? This might sound absolutely crazy but have you tried reproducing this with another entity? – Sean Feldman Oct 05 '22 at 01:44
  • One more angle I completely forgot about - what's the lock duration? – Sean Feldman Oct 05 '22 at 01:51
  • My code example uses a new topic and subscription, to avoid other processes consuming from the same subscription. The lock duration is 2 minutes. – jbiversen Oct 05 '22 at 06:44
  • All I can think of is that the processing taking close to or just slightly over 2minutes before attempting the completion. And as a result of lost locks, messages are DLQ-ed w/o any exceptions. I'd think this _could_ happen if there were enough messages to keep the receiving loop busy. You could validate the assumption by bumping your MaxDeliveryCount up substantially and see what the delivery count on messages looks like and wherever some fail or not. – Sean Feldman Oct 05 '22 at 13:18