2

I have several Event Grid Triggers using Python on the Linux Consumption Plan that are executed when new blobs are created in Azure Storage. It's possible for more than one function instance to run simultaneously if blobs are created at or around the same time. For instance I have two event triggers that look for blobs created with the following format:
Trigger 1

  "subjectBeginsWith": "/blobServices/default/containers/client1",
  "subjectEndsWith": ".txt"

Trigger 2

  "subjectBeginsWith": "/blobServices/default/containers/client2",
  "subjectEndsWith": ".txt"

If two blobs are created at the same time, I want to limit Azure Functions to only run one application (it doesn't matter which one) at a time to prevent memory issues. This scenario is fairly rare so I'm considering using the preview setting WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT to only allow one invocation to run at a time. Would this work or is there a better way?

I have another issue where it's possible for multiple blobs to be created at around the same time per client and I only care if one is processed per day. For example client 1 could have the following files created in a day:

client1/file1_20201024.txt
client1/file2_20201024.txt
client1/file3_20201024.txt

I only care if one file is created a day. I can create special exception handling in the code to see if the work was completed and then have the Python script return, but I'm wondering if there is a built in setting in Event Grid to handle cases like this. I.e., if three blobs are created within one minute, only create one event instead of three.

ddx
  • 469
  • 2
  • 9
  • 27

1 Answers1

1

Issue 1: You should use a storage queue event handler (instead of the azure function) to push the events to the queue and then pull-up them one at the time. Note, in the case of multiple VMs, you can use a leased blob technic to control a concurrency across multiple VMs, see more details here.

Issue 2: There is no built-in a feature for your described needs in the Azure Event Grid, so it must be handled in the subscriber logic.

Roman Kiss
  • 7,925
  • 1
  • 8
  • 21
  • That makes sense! So I can use the storage queue event handler and set `batchSize` to 1 to limit parallel executions and that would mostly handle Issue 1. If I have multiple function instances, i.e., `QueueTrigger1`, `QueueTrigger2`, `QueueTrigger3`, how do I ensure that only one instance runs at a time or will `WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT` take care of that? Issue 2 isn't really a problem if I can ensure only one instance runs at a time because I can add logic to skip the Python code if the work has been processed. – ddx Oct 25 '20 at 21:03
  • It looks like `functionScaleLimit` might be working for me: https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#limit-scale-out – ddx Oct 26 '20 at 04:39
  • It will be not help in the case of multiple VMs. You should consider to use a leased blob for control a concurrency of instances across multiple VMs. – Roman Kiss Oct 26 '20 at 04:49
  • Sorry, I'm having a little trouble understanding your example. You were right, it looks like the `functionScaleLimit` isn't totally working. It is now waiting for a previous QueueTrigger to finish completing and then will execute, but now it will time out because of that wait time. How can I use a leased blob with a Queue Trigger? – ddx Oct 26 '20 at 07:49
  • what's the function processing time? Setting batchSize=1 will allow to process one message at the time per VM (process). The leased blob will work as a distributed locker, see more details in https://stackoverflow.com/questions/52481832/azure-blob-storage-acquireleaseasync-synchronously-wait-until-lock-is-release/52483247#52483247 – Roman Kiss Oct 26 '20 at 11:04
  • each function queue trigger instance can take up to about 8 minutes. Would the leased blob code go in each `__init__.py` script in my Python queue triggers? – ddx Oct 26 '20 at 15:22
  • in this case, when a function processing is taking up to 8 minutes, using the leased blob will not help it. – Roman Kiss Oct 26 '20 at 15:42
  • Darn, do you think it's possible to use Durable Functions to manage the queue trigger invocations? – ddx Oct 26 '20 at 16:05
  • I think the solution might be to only have one Storage Queue where multiple events can be passed to and only one QueueTrigger that has special handling to process the queue messages based on client so I can ensure only one client can run at a time. – ddx Oct 26 '20 at 16:34
  • you have to select the hosting (serverless) plan with only one VM. – Roman Kiss Oct 26 '20 at 16:51