0

Let's say I have the following orchestration:

[FunctionName("Orchestration")]
public static async Task Orchestration_Start([OrchestrationTrigger]  DurableOrchestrationContext ctx)
{
    await ctx.CallActivityAsync("Foo");
    await ctx.CallActivityAsync("Bar");
    await Task.WhenAll(ctx.CallActivityAsync("Baz"), ctx.CallActivityAsync("Baz"));
}

All my activities utilize an Azure SQL database, and if any of the calls fails, I want to be undo all the changes made by previous activities - so for example if the second call to Baz throws an exception, I want to undo everything done by Foo, Bar and if the first Baz has completed, I want to undo its modifications too.

In a non-Functions application, I'd be able to just wrap the entire body of the orchestration in a using scope = new TransactionScope() block.

Will this work for a potentially distributed orchestration, and if not, is there any analogous mechanism in the Azure Functions framework? Or am I required to write a rollback implementation for each of the activities and commit the changes to the database after completing each of them?

Jayendran
  • 9,638
  • 8
  • 60
  • 103
Maciej Stachowski
  • 1,708
  • 10
  • 19

1 Answers1

0

Durable Functions implement a mechanism of eventual consistency. This is a quite different concept than other kinds of consistency(e.g. strong) as it guarantees, that a transaction will be completed eventually. What does that mean?

By using TransactionScope you can ensure, that if anything goes wrong within a transaction, a rollback will be performed automatically. In Durable Function it is not the case - you have no automated feature, which gives you such functionality - in fact, if the second activity from your example fails, you will end up with an inconsistent data stored within a database.

To implement a transaction in such scenario, you have to try/catch possible issue and perform logic, which will allow you to mitigate an error:

[FunctionName("Orchestration")]
public static async Task Orchestration_Start([OrchestrationTrigger]  DurableOrchestrationContext ctx)
{
    try 
    {
        await ctx.CallActivityAsync("Foo");
        await ctx.CallActivityAsync("Bar");
        await Task.WhenAll(ctx.CallActivityAsync("Baz"), ctx.CallActivityAsync("Baz"));
    }
    catch(Exception)
    {
        // Do something...
    }  
}

There is also a possibility to implement a retry policy to avoid transient errors:

public static async Task Run(DurableOrchestrationContext context)
{
    var retryOptions = new RetryOptions(
        firstRetryInterval: TimeSpan.FromSeconds(5),
        maxNumberOfAttempts: 3);

    await ctx.CallActivityWithRetryAsync("FlakyFunction", retryOptions, null);

    // ...
}

However, the important thing is to understand how the runtime of Durable Functions really manages a situation, when something goes wrong. Let us assume, that the following code fails:

[FunctionName("Orchestration")]
public static async Task Orchestration_Start([OrchestrationTrigger]  DurableOrchestrationContext ctx)
{
    await ctx.CallActivityAsync("Foo");
    await ctx.CallActivityAsync("Bar"); // THROWS!
    await Task.WhenAll(ctx.CallActivityAsync("Baz"), ctx.CallActivityAsync("Baz"));
}

If you replay the whole orchestration, the first activity(the one with "Foo" passed) will not be executed once more - its state will be stored in a storage, so a result will be immediately available. The runtime performs a checkpoint after each activity, so the state is preserved and it knows, where it finished previously.

Now to handle a situation properly, you have to implement the following algorithm:

  • perform a manual rollback when an exception was caught
  • if that fails, push a message to e.g. queue, which is then handled manually by someone, who understand how the process works

While initially, it may look like a big flaw, in fact, it is a perfectly fine solution - errors do occur so it is always a good idea to avoid transient ones(using retry), but if rollback fails, this clearly indicates that there is something wrong in your system.

The choice is yours - whether you have strong consistency and have to deal with problems with scalability, or you use looser model which provides better scalability, but is more difficult to work with.

kamil-mrzyglod
  • 4,948
  • 1
  • 20
  • 29