-1

I have a time-triggered Azure Function that runs every SECOND. The function reads data from API Servers and stores it into ADLS. How can I optimize the performance of the function so that it can make more that 500 API calls and store per second data for each call in a SECOND.

        public static void Run([TimerTrigger("*/1 * * * * *")] TimerInfo myTimer, ILogger log)
        {
            log.LogInformation($"C# Timer trigger function executed at: {DateTime.Now}");


            log.LogInformation($"Execution starts at: {DateTime.Now.ToString("hh.mm.ss.ffffff")}");

            try
            {
                var IDs = GetIDs(); //makes 1 API call to fetch list of IDs
                
                foreach(var i in IDs){
                   ReadAndWriteData(i); //reads data for each ID from API server and stores in ADLS
                }
            }
            catch (Exception e)
            {
                log.LogError($"An exception has been raised : {e}");
            }

            log.LogInformation($"C# Timer trigger function execution ended at: {DateTime.Now}");
        }
       
        public static async Task<List<string>> GetIDs(){
          //List<string> idList = await Task.Run(()=> ReadIDs()); //makes 1 API call to fetch list of IDs
          //return idList;
        }
       public static async Task ReadAndWriteData(String id){
           //var result = await Task.Run(()=> ReadData()); //reads data for each ID from API server
           ...
           // uploads data to ADLS 
       }

What is the best possible way to get data accurately for all IDs per second? I have tried some parallel programming/ TPL methods but still it is giving expected accuracy if I use only one ID, not for all.

vidhi
  • 75
  • 1
  • 2
  • 6
  • 1
    I'm not sure what your problem is. What is the relation between an ID and time? – Magnus Jan 12 '21 at 09:16
  • while I also don't understand what your actual problem is, your code just looks wrong. E.g. `var IDs = GetIDs();` You are making a non-awaited call to an async method - and then trying to use the result in the foreach loop. that just does not work like that. – silent Jan 12 '21 at 09:34
  • Hi @Magnus, I need to fetch data for all IDs every second. Let me try to make it more clear. Let's forget about ID and consider we have to call ** ReadAndWriteData(i)**, where i is just a list of strings like {"1","2",..."500"}. If this list has 2,3 values I am getting data for every second and uploading it in ADLS in following storage hierarchy yyyy/mm/dd/H/M/S/file.json. But for 500 values, I can see in every minute's folder only 1 folder gets created for a second, not for all 60 secs. Hope, this helps understand the problem. – vidhi Jan 12 '21 at 09:38

1 Answers1

0

First of all, there can be many issues that are causing performance problems to you. You have to debug using Azure Service Profiler or any other tool and check which line of code is taking how much time.

Some of the reasons can be:

  1. There is inefficient algorithm written for fetching IDs/ADLS operations.
  2. You have not added .ConfigureAwait(false) along with await.
  3. Automatic scaling is not enabled for Azure Functions and scaling is hampered due to insufficient manual scaling.
  4. You are using heavy Nuget packages which take a lot of time to create an instance of Azure Functions.
  5. You have not made ReadIDs and ReadData functions as asynchronous.
Harshita Singh
  • 4,590
  • 1
  • 10
  • 13
  • Hey, @singhh-msft. Thanks for your suggestions. 1. Though the function doesn't have much scope for optimization. I will still take a look at it. 2. Added `ConfigureAwait(false)` along with await, but didn't make much difference. 3. How can I enable Automatic scaling for Azure Function? 4. _Energistics_ is the only Nuget package that I am using apart from the common ones like NewtonSoft.Json, since I am working with WITSML data. 5. All functions are async. – vidhi Jan 12 '21 at 11:48
  • 3. Scaling: https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#scale; 5. Function definition should have `Task` in return type and `async` keyword. pls make sure this is there. For rest points: okay – Harshita Singh Jan 12 '21 at 12:00
  • In my scenario, using Consumption Plan for function app should improve the performance right? As it has max 200 instances. Is there anything I need to configure to ensure that max instances are used? – vidhi Jan 12 '21 at 12:19
  • It depends on the load you are giving. – Harshita Singh Jan 12 '21 at 14:18