12

I’m trying to understand how scaling works using azure functions. We’ve been testing with an app that generates 88 messages in a storage queue, which triggers our function. The function is written in c#. The function downloads a file, performs some processing on it (it will eventually post it back but we aren’t doing that yet for testing purposes). The function takes about 30 seconds to complete per request (total ~2500 seconds of processing). For testing purposes we loop this 10 times.

Our ideal situation would be that after some warming, Azure would automatically scale the function up to handle the messages in the most expedient way. Using some sort of algorithm taking into account spin up time, etc.. Or just scale up to the number of messages in the backlog, with some sort of a cap.

Is this how it is supposed to work? We have never been able to get over 7 ‘consumption units’. And generally take about 45 minutes to process the queue of messages.

Couple of other question re scalability… Our function is a memory intensive operation, how is memory ‘shared’ across scaled instances of a function? I ask because we are seeing some out of memory errors, that we don’t normally see. We’ve configure for the max memory for the function (1536MB). Seeing about 2.5% of the operations failing from an out of memory error

Thanks in advance, we’re really looking to make this work as it would allow us to move a lot of our work off of dedicated windows VMs on EC2 and onto Azure functions.

pchowdhry
  • 303
  • 5
  • 12

1 Answers1

22

The intent is that the platform takes care of automatically scaling for you with the ultimate goal that you don't have to think or care about the number of "consumption units" (sometimes referred to as instances) that are assigned to your function app. That said, there will always be room for improvement to ensure we get this right for the majority of users. :)

But to answer your question about the internal details (as far as queue processing goes), what we have in place right now is a system which examines the queue length and the amount of time each message sits in the queue before being processed by your app. If we feel like your function app is "falling behind" in processing these messages, then more consumption units will be added until we think your app is able to keep up with the incoming load.

One thing that's very important to mention is that there is another aspect of scale besides just the number of consumption units. Each consumption unit has the ability to process many messages in parallel. Often times we see that the problem people have is not the number of allocated consumption units, but the default concurrency configuration for their workload. Take a look at the batchSize and newBatchThreshold settings which can be tweaked in your host.json file. Depending on your workload, you may find that you get significantly better throughput when you change these values (in some cases, reducing concurrency has been shown to dramatically increase throughput). For example, you may observe this if each function execution requires a lot of memory or if your functions depend on an external resource (like a database) which can only handle limited concurrent access. More documentation on these concurrency controls can be found here: https://github.com/Azure/azure-webjobs-sdk-script/wiki/host.json.

As I hinted at above, playing with per-consumption unit concurrency may help with the memory pressure issues you've been encountering. Each consumption unit has its own pool of memory (e.g. its own 1.5 GB). But if you're processing too many messages in a single consumption unit, then that could be the source of the out-of-memory errors you're seeing.

With all this said, we are constantly doing work to identify and optimize certain load scenarios which we think are the most common, whether it's draining a pile of messages from a queue, consuming a "stream" of blobs in a storage container, processing a flood of HTTP requests, etc. Expect things to change as we learn, mature, and get more feedback from folks like yourself. The best place to provide such feedback to the product group is in our GitHub repo's issue list, which is reviewed regularly.

Thanks for the question. I hope this information was helpful and that you're able to get the numbers you're looking for.

Chris Gillum
  • 14,526
  • 5
  • 48
  • 61
  • Hi Chris, thanks a lot for this. We'll explore the concurrency option, and document back on this thread, in case anyone else ends up here, through search. I looked into the root directory, and while there is a host file, it is completely blank. I'm assuming this is because we made the function through the portal UI. I'll cut and paste from the github copy, and see where we end up. – pchowdhry Jun 08 '16 at 20:58
  • Hi Chris, just to clarify your above answer.. Batch size we can set at 1, which means that each consumption unit will only process 1 message at a time. newbatchthreshold we can set at 100. In this scenario if we put 88 messages in the queue, the platform would launch 88 consumption units to process our messages. More or less... Using this we've been able to eliminate the memory errors, but can't seem to get over 10 consumption units... – pchowdhry Jun 09 '16 at 00:54
  • 2
    10 is a temporary maximum we have in place. This will be increased in the future. – Chris Gillum Jun 09 '16 at 19:06
  • 5
    Update on my previous comment for those who are interested. As of GA, you can get as many as 60 or more consumption units for your function app when using the Consumption plan. – Chris Gillum Dec 05 '16 at 22:33
  • With default hosts files settings, is parallel processing of a queue or blob storage done within a single consumption unit? – Chris Harrington Mar 19 '17 at 04:14
  • You said: "Depending on your workload, you may find that you get significantly better throughput when you change these values (in some cases, reducing concurrency has been shown to dramatically increase throughput).". It would be really helpful to have some examples of situations where reducing concurrency can increase throughput. Is it only related to available memory in a function? – Chris Harrington Mar 19 '17 at 04:23
  • @ChrisHarrington Yes, by default queue and blob processing happens in parallel. The host.json doc I linked to above shows all the defaults values, so you should see as many as 16-24 functions executing concurrently on a single instance. Memory is the main example where reducing concurrency helps improve throughput. Other examples would be when your functions depend on some other external resource, like a database, which can only handle so much concurrency. – Chris Gillum Mar 21 '17 at 16:26
  • @ChrisGillum I am using Azure Functions v3 and have a similar scenario to OP. I am trying a very similar solution to the one OP described in the comments: with batch size 1 and new batch threshold 100. My aim is to have every function triggered on a different VM instance. this would provide more resource power to each function, and finish the task quicker (I am aware this may be a bit over the top but for the time being it is what I am trying). I am not seeing the instances ever increasing past 3/4 max, when there would be as many as 60 queue items processed almost concurrently. Any idea? TIA! – Jurgen Cuschieri Oct 07 '21 at 13:20
  • @ChrisGillum actually managed to make it work with batch size 1 and threshold 0. It had seemed that the description of OP in this same comment thread was implying the same intention as mine. Would the 100 threshold (as opposed to 0) somehow allow scaling out by 1 VM per function instance? – Jurgen Cuschieri Oct 07 '21 at 14:22
  • 1
    @JurgenCuschieri the new batch threshold setting can be a little confusing. It works by prefetching messages when the number of concurrently processed messages on a single VM goes below the configured threshold value. If you want to ensure only one VM per message, then you'll want to set batch size to 1 and new batch threshold to 0 (and it sounds like this is already working for you). – Chris Gillum Oct 11 '21 at 18:40