4

I am using Azure SQL Single database DTU purchase model. The average load seems to be under 10%. From time to time I am receiving an error, now it's happening almost each day. I am using EF Core 3.1 for database access. The database is accessed from an API built with ASP.NET core 3.1 and deployed on a Linux Azure App service.
The error is:
Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): Unknown error 258

Stack trace:

An exception occurred in the database while saving changes for context type 'MTP.Api.Persistence.MTPDbContext'.
Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while updating the entries. See the inner exception for details.
 ---> Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__164_0(Task`1 result)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)
ClientConnectionId:16f899d4-cfc9-4401-b631-1b4d547c4c19
Error Number:-2,State:0,Class:11
ClientConnectionId before routing:02e37a1e-981c-4ff6-9437-cade8b401cc5
Routing Destination:c71faab34237.tr1.francecentral1-a.worker.database.windows.net,11018
   --- End of inner exception stack trace ---
   at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IList`1 entriesToSave, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(DbContext _, Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)

Any idea what could cause this ? Or how I can find what is causing the problem ?

Emanuel Paul
  • 81
  • 1
  • 8
  • Turn on the query store and monitor the query execution time and wait stats. https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-ver15 This is a client-side timeout which you can extend: https://learn.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.relationaldatabasefacadeextensions.setcommandtimeout?view=efcore-3.1 – David Browne - Microsoft Jul 24 '20 at 15:59
  • Thanks David, I'll turn on the query store. The timeout period is already 30 seconds. – Emanuel Paul Jul 24 '20 at 16:13
  • You may need to extend the timeout to capture the wait stats as I believe they may only get captured for queries that complete. Query store is on by default in sql azure. Query plan choice may be a factor so please post that if you are stuck – Conor Cunningham MSFT Jul 24 '20 at 23:30
  • I'll increase the timeout. This error seems to occur only on Azure SQL it seems to work fine on SQL Server 2017 Express – Emanuel Paul Jul 25 '20 at 10:24
  • How large are the tables involved? If large then one possibility for getting this sort of sporadic error may be statistics update (if they aren't set to update async) - another possibility for any size of table could be blocking from other transactions – Martin Smith Jul 25 '20 at 15:29
  • The tables are really small, the largest used for reading and writing is about 2-3 MB. There is one that's 117 MB but that one is used just for reading data. – Emanuel Paul Jul 26 '20 at 10:59

2 Answers2

4

I updated the nuget package Microsoft.Data.SqlClient to version 2.0. This seems to solve the problem.

The API is using EF Core 3.1. EF Core is using MARS(MultipleActiveResultSets). There seems to be a problem with Microsoft.Data.SqlClient version 1.0 when deployed on Linux and using MARS. EF Core is using by default v1.

Emanuel Paul
  • 81
  • 1
  • 8
0

This might be related to EF Core exception when i send multiple requests #13452:

I updated my project to ASP.NET Core 2.1 RTM. When I send multiple requests, i get this exception:

Exception has occurred: CLR/System.Data.SqlClient.SqlException
An exception of type 'System.Data.SqlClient.SqlException' occurred in Microsoft.EntityFrameworkCore.dll but was not handled in user code: 'Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.'
 Inner exceptions found, see $exception in variables window for more details.
 Innermost exception   System.ComponentModel.Win32Exception : Unknown error 258

@ajcvickers I found problem.The SQL Server host has received two updates: KB4338815, KB4338824.

Known issues in this update Restarting the SQL Server service may fail occasionally with the error, “Tcp port is already in use”.

When I removed these updates, my project worked without problem.


Of course, this could also just be a deadlock scenario. To track this more easily, you might want to log both: very slow but successful queries and failing queries.

In a deadlock scenario, at least two queries are involved. At least one would timeout, but the other could still be under the timeout threshold and then succeed once the first query got cancelled. So if you log both, you can trace the reason for the deadlock.


In any case, if you want to deal with issues like this in general, you might want to setup Connection Resiliency by using options.EnableRetryOnFailure() and probably lowering the command timeout value. This will retry failing operations.

lauxjpn
  • 4,749
  • 1
  • 20
  • 40
  • It seems to work fine on SQL Server 2017 express. The error occurs only on SQL Azure. So probably this is not related to the error that I am receiving. – Emanuel Paul Jul 25 '20 at 10:27
  • I updated my answer with another reason and some general advice. – lauxjpn Jul 25 '20 at 13:25
  • Connection Resiliency is already used. I noticed that all table is blocked so no actions can be executed on that table until the service plan is changed. – Emanuel Paul Jul 26 '20 at 11:03
  • Check the server-side database log at the time to see what is going on. Also, what do you mean by "until the service plan is changed"? – lauxjpn Jul 26 '20 at 11:08
  • When using SQL Azure there are multiple pricing tiers and purchase modes. The table remains locked(the timeout error occurs) until the pricing tier is changed from 20 DTUs to 50 DTUs for example. – Emanuel Paul Jul 26 '20 at 17:37
  • How many concurrent connections do you have at your peak and are those timeouts happening at your peak time? Assuming there is a deadlock, that could lead to all other connections locking up as well while waiting for one of the locked tables. This would exhaust your DTUs. Other requests might still queue up behind and while the first request will eventually timeout, the following requests might continue to lockup the tables. If a timeout occurs, you should log how many DTUs are currently used and if any tables are locked/deadlocked. – lauxjpn Jul 26 '20 at 18:43