We have an EventHub with a retention of 1 day, containing millions of messages. To consume this, we have an Azure Function which reads from this event hub via the Event Hub Binding. The function basically reads the raw bytes, deserializes it into a json, does some transformations, and outputs it to another event hub.
It takes an EventData[] as input, to allow us to receive a batch of EventData at once. We have configured it to receive 1024 messages per batch.
When we start the function, and it needs to reprocess the last 24 hours, it's only using 1 node of the available 5 we have in the app service plan, as can be seen in the metrics:
According to the docs, scaling should behave like this:
When your function is first enabled, there is only one instance of the function. Let's call this function instance Function_0. Function_0 has a single EventProcessorHost instance that has a lease on all ten partitions. This instance is reading events from partitions 0-9. From this point forward, one of the following happens:
New function instances are not needed: Function_0 is able to process all 1000 events before the Functions scaling logic kicks in. In this case, all 1000 messages are processed by Function_0.
An additional function instance is added: The Functions scaling logic determines that Function_0 has more messages than it can process. In this case, a new function app instance (Function_1) is created, along with a new EventProcessorHost instance. Event Hubs detects that a new host instance is trying read messages. Event Hubs load balances the partitions across the its host instances. For example, partitions 0-4 may be assigned to Function_0 and partitions 5-9 to Function_1.
N more function instances are added: The Functions scaling logic determines that both Function_0 and Function_1 have more messages than they can process. New function app instances Function_2...Functions_N are created, where N is greater than the number of event hub partitions. In our example, Event Hubs again load balances the partitions, in this case across the instances Function_0...Functions_9.
I believe we're hitting option #1, even though we have 24 hours of data on the event hub with only 1 node processing the data. At this rate it takes many hours to process, while 4 nodes are idle.
How does Azure Function know when to scale in this scenario, and can we influence this behavior?