1

I have a v1 Azure Function that is triggered by a message being written to the Azure Storage message queue.

The Azure Function needs to perform multiple updates to SharePoint Online. Occasionally these operations fail. This results in the message being returned to the queue and being reprocessed.

When I developed the function, I didn't consider that it might partially complete and then restart. I've done a little research and it sounds like I need to modify it to be re-entrant.

Is there a design pattern that I should follow to cater for this without having to add a lot of checks to determine if an operation has already been carried out by a previous execution? Alternatively, is there any Azure functionality that can help (beyond the existing message retries and poison queue)

Ivan Wilson
  • 403
  • 3
  • 12
  • You could use manual completion of the message, cloning the original message and adding properties to say which parts of the original message passed. Complete the original message and send the cloned message back to the queue. When the cloned message is resent to the queue (after a suitable delay perhaps), you can check the properties to see which operations not to repeat. – codebrane Apr 07 '22 at 09:21

1 Answers1

0

It sounds like you will need to do some re-engineering. Our team had a similar issue and wrote a home-grown solution years ago. But we eventually scrapped our solution and went with Azure Durable Functions.

Not gonna lie - this framework has some complexity and it took me a bit to wrap my head around it. Check out the function chaining pattern.

We have processing that requires multiple steps that all must be completed. We're spanning multiple data stores (Updating Cosmos Db, Azure SQL, Blob Storage, etc), so there's no support for distributed transactions across multiple PaaS offerings. Durable Functions will allow you to break your process up into discrete steps. If a step fails, an orchestrator will re-run that step based on a retry policy.

So in a nutshell, we use Durable Task Activity functions to attempt each step. If the step fails due to what we think is a transient error, we retry. If it's an unrecoverable error, we don't retry.

Rob Reagan
  • 7,313
  • 3
  • 20
  • 49