2

Imagine that I have a storage account with a blob container, which get files uploaded eventually. I want to process each file that reaches on the blob storage, open it, extract and store information. Definitively a expensive operation that could fit in a Durable Functions scenario.

Here's the trigger:

        [FunctionName("PayrollFileTrigger")]
        public static async Task Start(

         [BlobTrigger("files/{name}", Connection = "AzureWebJobsStorage")]Stream myBlob, string name,
         [DurableClient] IDurableOrchestrationClient starter,
         ILogger log)
        {

            string instanceId = await starter.StartNewAsync("PayrollFile_StartFunction", "payroll_file", name);

        }

...which calls the orchestration:


        [FunctionName("PayrollFile_StartFunction")]
        public async static Task<IActionResult> Run(
            [OrchestrationTrigger] IDurableOrchestrationContext context, string blobName, 

            ExecutionContext executionContext, ILogger log)
        {

            //Downloads the blob
            string filePath = 
                await context.CallActivityWithRetryAsync<string>("DownloadPayrollBlob", options, blobName);

            if (filePath == null) return ErrorResult(ERROR_MSG_1, log);

            //Extract data
            var payroll = 
                await context.CallActivityWithRetryAsync<Payroll>("ExtractBlobData", options, filePath);

           ... and so on (just a sample here) ...
         }

But there is a problem. While testing this error occurs, meaning, I think, that I can't start another orchestration with the same id:

An Orchestration instance with the status Pending already exists.



1 - So if I push many files to the container which the trigger is "listening", in a short period of time, the orchestration will get busy with one of them and will ignore other further events?

2 - When the orchestration will get rid of pending status? It occurs automatically?

3 - Should I create a new orchestration instance for each file to be processed? I know you can omit the instanceId parameter, so it get generated randomly and never conflicts with one already started. But, is it safe to do? How do I manage them and ensure they will get finished sometime?

Ramon Dias
  • 835
  • 2
  • 12
  • 23

2 Answers2

1
string instanceId = await starter.StartNewAsync("PayrollFile_StartFunction", "payroll_file", name);

The second argument is the instanceId, which is required to be unique.

Instead, try:

string instanceId = await starter.StartNewAsync("PayrollFile_StartFunction", input: name);
Rob Reagan
  • 7,313
  • 3
  • 20
  • 49
  • Thanks for the answer. Yes, I know this. But is it safe to do? Should I use `TerminateAsync()` to kill the orchestration manually when the blob processing finishes? Or I definitively don't need to care about to manage the orchestrations started? – Ramon Dias Apr 01 '20 at 15:42
  • You don't need to worry about killing it. The error message you were getting was due to identical ids. If they are not identical, the function should finish and you won't get those collisions. – Rob Reagan Apr 01 '20 at 17:13
1

Depending on what you want you might want to have only 1 durable instance per file. Microsoft state that you should

Use a random identifier for the instance ID. Random instance IDs help ensure an equal load distribution when you're scaling orchestrator functions across multiple VMs. The proper time to use non-random instance IDs is when the ID must come from an external source, or when you're implementing the singleton orchestrator pattern.

https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-instance-management?tabs=csharp#start-instances

In your specific case I'd say you can go without supplying the instanceId yourself and perhaps log the generated instanceId or write it in a storage solution alongside information about the file that started the orchestration.

underscoreHao
  • 331
  • 2
  • 9