3

We are making use of Azure Functions (v2) extensively to fulfill a number of business requirements.

We have recently introduced a durable function to handle a more complex business process which includes both fanning out, as well as a chain of functions.

Our problem is related to how much the storage account is being used. I made a fresh deployment on an account we use for dev testing on Friday, and left the function idling over the weekend to monitor what happens. I also set a budget to alert me if the cost start shooting up.

Less than 48 hours later, I received an alert that I was at 80% of my budget, and saw how the storage account was single handedly responsible for the entire bill. The most baffling part is, that it's mostly egress and ingress on file storage, which I'm entirely not using in the application! So it must be something internal by the azure function implementations. I've dug around and found this. In this case the issue seems to have been solved by switching to an App Service plan, but this is not an option in our case and must stick to consumption. I also double checked and made sure that I don't have the AzureWebJobsDashboard setting.

Any ideas what we can try next?

The below are some interesting charts from the storage account. Note how file egress and ingress makes up most of the activity on the entire account.

enter image description here

A ticket for this issue has also been opened on GitHub

MarkB
  • 1,783
  • 2
  • 17
  • 32
  • Would be interested to know how much your budget alert was set for? We have some use cases for durable functions so this would be useful info. – Matt Aug 26 '19 at 13:33
  • My budget was set for ~£30, which I am aware isn't a lot. But considering that this was over a weekend on an idling dev infrastructure, getting to £30 is a bit excessive. – MarkB Aug 26 '19 at 13:57
  • Yeah I agree, and for a single orchestrator that’s significant, thanks. – Matt Aug 26 '19 at 14:08

2 Answers2

0

The link you provided actually points to AzureWebJobsDashboard as the culprit. AzureWebJobsDashboard is an optional storage account connection string for storing logs and displaying them in the Monitor tab in the portal. The storage account must be a general-purpose one that supports blobs, queues, and tables.

For performance and experience, it is recommended to use APPINSIGHTS_INSTRUMENTATIONKEY and App Insights for monitoring instead of AzureWebJobsDashboard

When creating a function app in App Service, you must create or link to a general-purpose Azure Storage account that supports Blob, Queue, and Table storage. Internally, Functions uses Storage for operations such as managing triggers and logging function executions. Some storage accounts do not support queues and tables, such as blob-only storage accounts, Azure Premium Storage, and general-purpose storage accounts with ZRS replication. These accounts are filtered out of from the Storage Account blade when creating a function app.

When using the Consumption hosting plan, your function code and binding configuration files are stored in Azure File storage in the main storage account. When you delete the main storage account, this content is deleted and cannot be recovered.

Ken W - Zero Networks
  • 3,533
  • 1
  • 13
  • 18
  • Hi Ken, thanks for your reply. I am already making use of App Insights in my application (as such i have the APPINSIGHTS_INSTRUMENTATIONKEY config value), and have made sure that the AzureWebJobsDashboard is not present. As such, I have eliminated this as a possible reason why we're having these errors. – MarkB Aug 26 '19 at 13:28
  • The Github issue mentions that there was still a lot of IO even without the `AzureWebJobsDashboard` setting. – Alex AIT Aug 26 '19 at 17:15
0

If you use the legacy "General Purpose V1" storage accounts, you may see your costs drop by up to 95%. I had a similar use case where my storage account costs exploded after the accounts were upgraded to "V2". In my case, we just went back to V1 instead of changing our application.

Altough V1 is now legacy, I don't see Azure dropping it any time soon. You can still create it using the Azure Portal. Could be a medium-term solution.

Some alternatives to save costs:

  • Try the "premium" performance tier (V2 only). It is cheaper for such workloads.
  • Try LRS or ZRS as the redundancy setting. Depends on the criticality of this orchestration data.

PS: Our use case were some EventHub processors which used the storage accounts for coordination and checkpointing.

PS2: Regardless of the storage account configuration, there must be a way reduce the traffic towards the storage account. It is just another thing to try to reduce costs.

Alex AIT
  • 17,361
  • 3
  • 36
  • 73
  • Hi Alex, unfortunately i'm already making use of V1 storage! I don't even want to think how high the bill would be if i was using V2 – MarkB Aug 27 '19 at 06:06
  • This is truly disturbing... It is not easy to cause such costs with a V1 storage account. I hope there is a better solution than "don't use the consumption plan". I'll be watching your GitHub Issue as well. – Alex AIT Aug 27 '19 at 06:47
  • I've also opened up a support ticket with MS and will provide an update if they help me resolve it. I might however, have had a bit of a breakthrough (albeit not very helpful) in the past couple of hours. I purposefully introduced a stackoverflow exception in my code by adding a case of infinite recursion. Upon doing this, the storage account is once again being bombarded by file transactions. To confirm, I am definitely not using the file share API in my code. – MarkB Aug 27 '19 at 10:07
  • Try enabling storage analytics logging, you should clearly see what kind of requests are being made against your storage account and who's making them - https://learn.microsoft.com/en-us/azure/storage/common/storage-analytics-logging – evilSnobu Aug 29 '19 at 09:37
  • @evilSnobu i enabled all available logging on the storage account. Unfortunately, files are the only ones which do not have fine grained logs as a possibility. I've been running additional tests over the weekend, and the situation is repeating itself. My files in/egress is blowing up, and I'm not using it. Whereas tables and queues, which i'm making use of, have an acceptable level of transferred data. – MarkB Sep 01 '19 at 06:51
  • From the linked article "Storage Analytics logging is currently available only for the Blob, Queue, and Table services." – MarkB Sep 01 '19 at 06:59