Rollback database changes in a durable function

Question

Let's say I have the following orchestration:

[FunctionName("Orchestration")]
public static async Task Orchestration_Start([OrchestrationTrigger]  DurableOrchestrationContext ctx)
{
    await ctx.CallActivityAsync("Foo");
    await ctx.CallActivityAsync("Bar");
    await Task.WhenAll(ctx.CallActivityAsync("Baz"), ctx.CallActivityAsync("Baz"));
}

All my activities utilize an Azure SQL database, and if any of the calls fails, I want to be undo all the changes made by previous activities - so for example if the second call to Baz throws an exception, I want to undo everything done by Foo, Bar and if the first Baz has completed, I want to undo its modifications too.

In a non-Functions application, I'd be able to just wrap the entire body of the orchestration in a using scope = new TransactionScope() block.

Will this work for a potentially distributed orchestration, and if not, is there any analogous mechanism in the Azure Functions framework? Or am I required to write a rollback implementation for each of the activities and commit the changes to the database after completing each of them?

score 0 · Answer 1 · answered Sep 06 '18 at 09:11

Durable Functions implement a mechanism of eventual consistency. This is a quite different concept than other kinds of consistency(e.g. strong) as it guarantees, that a transaction will be completed eventually. What does that mean?

By using TransactionScope you can ensure, that if anything goes wrong within a transaction, a rollback will be performed automatically. In Durable Function it is not the case - you have no automated feature, which gives you such functionality - in fact, if the second activity from your example fails, you will end up with an inconsistent data stored within a database.

To implement a transaction in such scenario, you have to try/catch possible issue and perform logic, which will allow you to mitigate an error:

[FunctionName("Orchestration")]
public static async Task Orchestration_Start([OrchestrationTrigger]  DurableOrchestrationContext ctx)
{
    try 
    {
        await ctx.CallActivityAsync("Foo");
        await ctx.CallActivityAsync("Bar");
        await Task.WhenAll(ctx.CallActivityAsync("Baz"), ctx.CallActivityAsync("Baz"));
    }
    catch(Exception)
    {
        // Do something...
    }  
}

There is also a possibility to implement a retry policy to avoid transient errors:

public static async Task Run(DurableOrchestrationContext context)
{
    var retryOptions = new RetryOptions(
        firstRetryInterval: TimeSpan.FromSeconds(5),
        maxNumberOfAttempts: 3);

    await ctx.CallActivityWithRetryAsync("FlakyFunction", retryOptions, null);

    // ...
}

However, the important thing is to understand how the runtime of Durable Functions really manages a situation, when something goes wrong. Let us assume, that the following code fails:

[FunctionName("Orchestration")]
public static async Task Orchestration_Start([OrchestrationTrigger]  DurableOrchestrationContext ctx)
{
    await ctx.CallActivityAsync("Foo");
    await ctx.CallActivityAsync("Bar"); // THROWS!
    await Task.WhenAll(ctx.CallActivityAsync("Baz"), ctx.CallActivityAsync("Baz"));
}

If you replay the whole orchestration, the first activity(the one with "Foo" passed) will not be executed once more - its state will be stored in a storage, so a result will be immediately available. The runtime performs a checkpoint after each activity, so the state is preserved and it knows, where it finished previously.

Now to handle a situation properly, you have to implement the following algorithm:

perform a manual rollback when an exception was caught
if that fails, push a message to e.g. queue, which is then handled manually by someone, who understand how the process works

While initially, it may look like a big flaw, in fact, it is a perfectly fine solution - errors do occur so it is always a good idea to avoid transient ones(using retry), but if rollback fails, this clearly indicates that there is something wrong in your system.

The choice is yours - whether you have strong consistency and have to deal with problems with scalability, or you use looser model which provides better scalability, but is more difficult to work with.

Rollback database changes in a durable function

1 Answers1