0

I have the following code in an Orchestrator:

        var parallelTasks = new List<Task>();

        // Get Records
        List<Record> records = await context.CallActivityAsync<List<Record>>("GetRecords", orchestrationContext);

        // Write Records
        foreach (Record record in records)
        {
            parallelTasks.Add(context.CallActivityAsync<int>("WriteRecord", record));
        }

        await Task.WhenAll(parallelTasks);

This fails, because GetRecords returns too much data (60000 records) and the Orchestrator does not continue as CallActivityAsync cannot return more than 8mb of data.

This may also fail because it will essentially attempt to start 60000 activities for each write.

I am doing it like this so Azure will write to the ADL using several threads. At first I tried with Semaphores and multiple sources online told me that one shouldn't use Sempahores but "CallActivityAsync" instead, which will allow Azure to manage its own threads.

How can I solve this and achieve multi-threaded write to ADL?

For the record, I am using a library that can only write a single file at a time (I know the new library from MS includes a Bulk Write, but I am unable to use that for different reasons).

Eduard G
  • 443
  • 5
  • 21
  • seems to me that your GetRecords should return the data in chunks – Thiago Custodio Jan 07 '20 at 16:57
  • Not sure whether azure functions is a fit for your requirements. What is the function does? Transfers of big amounts of data could be done by other services more efficient. – Peter Bons Jan 07 '20 at 17:09

1 Answers1

1

Is there a reason for GetRecords and WriteRecord to be in a Durable function setup? If not, GetRecords can drop each Record object (serialized to JSON) to an Azure Queue/EventHub, instead of returning a huge list. Then WriteRecords can be triggered from that Queue/EventHub to process each message.

Sai Puli
  • 951
  • 8
  • 12