Background
I have somewhat simplified this scenario but this is the general problem.
I am using an Azure Data Factory to ingest data from a custom API into a table in Azure Data Warehouse. I am using an IDotNetActivity to run the C# code that calls the API and loads the data into the data warehouse. The activity runs in Azure Batch.
Within the activity itself, before I call the custom API, I load a list of people from a file in Azure Blob storage. I then make a call to the custom API for each person in the file. These calls are made sequentially one after another. The problem is that this approach takes too long. The file size is likely to grow so the time it takes will only get worse.
Things I've tried to improve performance
- Making the API calls asynchronous and calling them in batches of 3. Strangely this ran slower. It looks like the batch process does not handle async / await all that well.
- Other strangeness we have seen is that MoreLinq's Batch command didn't work at all. I have checked the source code for this: https://github.com/morelinq/MoreLINQ/blob/master/MoreLinq/Batch.cs . This uses yield return but I have no idea why this isn't working or even if it is related to the async / await problem.
The Main Question
Does Azure Batch support async / await?
Further questions
- If Azure doesn't support async / await then what is a better way to approach this problem? i.e. Using a job manager and spinning up more nodes.
Can anyone shed some light as to why MoreLinq's Batch doesn't work in Azure Batch? Here is a snippet of the affected code:
List<int> personIds = GetPersonIds(clientAddress, clientUsername, clientPassword); var customResults = new List<CustomApiResult>(); foreach (var personIdsBatch in personIds.Batch(100)) { customResults.AddRange(GetCustomResultsByBatch(address, username, password, personIdsBatch)); }