1

I want to ensure that when I do a context.SaveChanges(), this is retried because the database might be temporarily down.

So far all I've found involves writing a lot of code that I'd then need to maintain, so is there something ready, an out-of-the-box tool, that I can use for resiliency?

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
Ph3n0m
  • 67
  • 7
  • You may not have to write anything, depending on what you mean by resiliency. This could mean a *lot* of things. A concurrency violation shouldn't be retried. Connection issues are usually handled by the ADO.NET provider. The SQL Server client already retries for example. Even if that isn't possible, a connection failure can easily be retried without a lot of code but could result in a concurrency error if the data is modified at the database level in the meantime. That's not a resiliency issue though, that's a business issue – Panagiotis Kanavos Jan 11 '23 at 12:47
  • You can use Polly to specify and execute retry policies. Different operations require different policies. In some cases you may need to retry more than just the call to `SaveChanges`, and probably have to reload eg inventory or availability data that may have changed. – Panagiotis Kanavos Jan 11 '23 at 12:48
  • EF Core also [offers connection resiliency strategies](https://learn.microsoft.com/en-us/ef/core/miscellaneous/connection-resiliency) at least since EF Core 6. In databases that don't support connection resiliency out of the box you can create a custom one. – Panagiotis Kanavos Jan 11 '23 at 12:54
  • 1
    Which database are you using? If you use SQL Server, all you have to do is add `options => options.EnableRetryOnFailure()` in your `UseSqlServer` call – Panagiotis Kanavos Jan 11 '23 at 13:30
  • @PanagiotisKanavos MySQL using Pomelo. When I tried that, it did not retry anything. – Mark Cilia Vincenti Jan 11 '23 at 13:43
  • @PanagiotisKanavos I'm on MySQL too. – Ph3n0m Jan 11 '23 at 13:49
  • The Pomelo provider has `EnabelRetryOnFailure` too – Panagiotis Kanavos Jan 11 '23 at 13:50
  • @PanagiotisKanavos do you have an example of how to use it, working with SaveChanges, and only retrying on transient errors? – Ph3n0m Jan 11 '23 at 13:56

3 Answers3

0

I've created a small library called ResilientSaveChanges.EFCore that allows resilient context.SaveChanges / SaveChangesAsync in Entity Framework Core, logging of long-running transactions and limiting of concurrent SaveChanges. It's straight to the point.

Available on GitHub and NuGet. Tried and tested in production on multiple private projects.

Mark Cilia Vincenti
  • 1,410
  • 8
  • 25
  • Connection resiliency and strategies are available out of the box and don't require using custom methods. Long transactions - a DbContext is a disconnected Unit-of-Work so there shouldn't be *any* explicit transactions. The same with limiting concurrent `SaveChanges` : why? If there are no explicit long-lived transactions, limiting saves doesn't help. Transaction throttling and limiting refers to the entire service operation, not the database calls. – Panagiotis Kanavos Jan 11 '23 at 13:06
  • So I'd still need to create an execution strategy, but at least it's straightforward. I love how you can tell which errors are transient and which aren't so you could handle them differently! Thanks for this, looks just like what I need! – Ph3n0m Jan 11 '23 at 13:08
  • Besides, using a Semaphore doesn't limit `SaveChanges` except for small single-server sites. Most applications will run on two or more servers. It does introduce an expensive block, wasting a ThreadPool thread that could be used to serve other requests while waiting. – Panagiotis Kanavos Jan 11 '23 at 13:09
  • @PanagiotisKanavos, throttling (limiting concurrent SaveChanges) is optional. In some cases, if you do not throttle them, the threads are still wasted waiting for the database server to respond, perhaps making the database so slow that even reads are struggling. One should use with care. Also, as for out of the box connection resiliency, it's a bit simplified with this library. – Mark Cilia Vincenti Jan 11 '23 at 13:17
  • That's not why databases are slowed down. A big SELECT will block even a single small UPDATE. Multiple small UPDATEs won't. The throttling you tried to implement is simply not what the services architecture books talk about. – Panagiotis Kanavos Jan 11 '23 at 13:21
  • As for simplifying, I only see code that uses the built-in retry feature but instead of allowing the *driver* to work, it tries to use explicit transactions. That *may* be necessary because/if MySQL doesn't have connection resiliency yet. It's a serious problem if the driver *does* have connection resiliency though – Panagiotis Kanavos Jan 11 '23 at 13:26
  • @PanagiotisKanavos I implemented the throttling because I had a project running on a severely underpowered server which was killing the machine to the point where either the application crashed or else the server became non-responsive to the point I couldn't even SSH in. The throttling solved this issue. As I said, it's purely optional. – Mark Cilia Vincenti Jan 11 '23 at 13:30
  • @PanagiotisKanavos the code is adapted from https://learn.microsoft.com/en-us/dotnet/architecture/microservices/ using Microsoft's own recommendations. I admit that I am not an expert on this area though: could you please explain a bit better with regards to the driver? – Mark Cilia Vincenti Jan 11 '23 at 13:37
  • You should have throttled the *controller actions*, not the database. Or not use an ORM like EF Core in the first place. An `UPDATE SomeTable Set Count=Count+1 Where Id=@id` is far more scalable than loading a row, modifying it and then saving it. Big reads will lock all the rows they return until the connection is closed too, so they're as important, or even more, when it comes to scaling. – Panagiotis Kanavos Jan 11 '23 at 13:37
  • What I should have or should not have done depends on the context of what I was coding, which I have not specified earlier. There's no one size fits all. In my case, I am receiving data from IoT devices and logging such data. I know exactly how many I have, and I know that if I try to do too many concurrent writes (which can sometimes happen, especially upon startup) that can be enough to kill the server/service. – Mark Cilia Vincenti Jan 11 '23 at 13:41
  • Pomelo has [RetryOnFailure](https://github.com/PomeloFoundation/Pomelo.EntityFrameworkCore.MySql/blame/9f6bd5709493f42a7f9a883a6178dbdb0d125cf7/src/EFCore.MySql/Infrastructure/MySqlDbContextOptionsBuilder.cs#L28). One of the overloads allows specifying the errors to retry on. – Panagiotis Kanavos Jan 11 '23 at 13:43
  • For IoT ORMs are a hindrance, at least for ingestion. There are no objects to work on, only measurements/events to record. You can have a *single* writer running in the background, batching and inserting the batches directly to the database. You can use TPL Dataflow or channels to create a pipeline that receives the events from controllers (perhaps even Minimal APIs), processes and batches them in separate steps, before finally inserting them into the database. Intermediate steps can use multiple worker tasks too by changing eg a `MaxDegreeOfParallelism` option in a TransformBlock – Panagiotis Kanavos Jan 11 '23 at 13:44
  • There's just 6 devices sending data every 10 seconds each, even that is enough to kill it sometimes :) What you're suggesting would work if I had more memory and more data to process, but with my current load it's actually an extra overhead. And now I don't know why I'm talking about the past as if it was the present. I migrated to different hosting and I don't have these problems anymore. – Mark Cilia Vincenti Jan 11 '23 at 13:47
  • 1
    I tried this library now. Copied that execution strategy, did a find&replace from `SaveChanges` to `ResilientSaveChanges`, and it works like a charm! Thank you – Ph3n0m Jan 11 '23 at 13:50
0

Yes, connection resiliency is available in EF Core. For MySQL, it's available through the Pomelo driver's EnabelRetryOnFailure() option. The Github Blame shows this was added 5 years ago which is a bit surprising. An overload added 3 years ago allows specifying extra errors to retry.

This code taken from one of the integration tests shows how it's used:

        services.AddDbContextPool<AppDb>(
            options => options.UseMySql(
                GetConnectionString(),
                AppConfig.ServerVersion,
                mysqlOptions =>
                {
                    mysqlOptions.MaxBatchSize(AppConfig.EfBatchSize);
                    mysqlOptions.UseNewtonsoftJson();

                    if (AppConfig.EfRetryOnFailure > 0)
                    {
                        mysqlOptions.EnableRetryOnFailure(AppConfig.EfRetryOnFailure, TimeSpan.FromSeconds(5), null);
                    }
                }
        ));

Without parameters EnableRetryOnFailure() uses the default retry count and maximum delay which are 6 and 30 seconds.

The third parameter is an ICollection<int> of additional errors to retry.

By default, the MySqlTransientExceptionDetector class specifies that only transient exceptions are retried, ie those that have the IsTransient property set, or timeout exceptions.

    public static bool ShouldRetryOn([NotNull] Exception ex)
        => ex is MySqlException mySqlException
            ? mySqlException.IsTransient
            : ex is TimeoutException;
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • How does this work? Does it still use transactions internally? If it starts doing a SaveChanges which involves writing to multiple tables, and the connection is lost halfway through, would I be left with half the data inserted? Also, where is it specified that only transient errors are retried? Lastly, can I log when retries need to be done? – Ph3n0m Jan 11 '23 at 14:04
  • @Ph3n0m `SaveChanges` itself uses a transaction internally to persist all changes in an atomic manner. There's no need to use explicit transactions. The [Execution strategies and transactions](https://learn.microsoft.com/en-us/ef/core/miscellaneous/connection-resiliency#execution-strategies-and-transactions) section explains you don't need to do extra work if you let EF handle transactions. What the Pomelo retry strategy does is specify which errors can be retried, so the base `ExecutionStrategy` can retry the operation – Panagiotis Kanavos Jan 11 '23 at 14:18
  • @Ph3n0m that's still not as lightweight as connection resiliency at the *driver* level. – Panagiotis Kanavos Jan 11 '23 at 14:19
  • As for how things are actually executed, check the [MySqlExecutionStrategy](https://github.com/PomeloFoundation/Pomelo.EntityFrameworkCore.MySql/blob/f4b1ab9497743db5a395a36340515b3d3fc28e3c/src/EFCore.MySql/Storage/Internal/MySqlExecutionStrategy.cs) class. EF passes the entire operation as a `Func` to the strategy. If that fails, retries are handled by EF itself – Panagiotis Kanavos Jan 11 '23 at 14:21
0

As @PanagiotisKanavos already pointed out, Pomelo already has connection resiliency support.

The simplest way to use it, is to enable the default strategy:

dbContextOptions.UseMySql(
    connectionString,
    serverVersion,
    mySqlOptions => mySqlOptions.EnableRetryOnFailure())

It will retry up to six times, incrementally waiting for longer periods between retries (but not longer than 30 seconds).


If you want to configure the retry strategy, use the following overload instead:

dbContextOptions.UseMySql(
    connectionString,
    serverVersion,
    mySqlOptions => mySqlOptions
        .EnableRetryOnFailure(
            maxRetryCount: 3,
            maxRetryDelay: TimeSpan.FromSeconds(15),
            errorNumbersToAdd: null))

If you want full control over all aspects of the execution strategy, you can inject your own implementation (that either inherits from MySqlExecutionStrategy or directly implements IExecutionStrategy):

dbContextOptions.UseMySql(
    connectionString,
    serverVersion,
    mySqlOptions => mySqlOptions
        .ExecutionStrategy(dependencies => new YourExecutionStrategy(dependencies))
lauxjpn
  • 4,749
  • 1
  • 20
  • 40