6

Two days ago, with no code changes or changes to the DB, I am not getting a lot (every 5 minutes or so) of errors with The wait operation timed out error with two different underlining full errors on about the pre-login and the other about the post:

System.Data.Entity.Core.EntityException: The underlying provider failed on Open. ---> System.Data.SqlClient.SqlException: Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was unable to respond back in time. The duration spent while attempting to connect to this server was - [Pre-Login] initialization=21; handshake=14988; ---> System.ComponentModel.Win32Exception: The wait operation timed out

System.Data.Entity.Core.EntityException: The underlying provider failed on Open. ---> System.Data.SqlClient.SqlException: Connection Timeout Expired. The timeout period elapsed during the post-login phase. The connection could have timed out while waiting for server to complete the login process and respond; Or it could have timed out while attempting to create multiple active connections. This failure occurred while attempting to connect to the routing destination. The duration spent while attempting to connect to the original server was - [Pre-Login] initialization=5; handshake=3098; [Login] initialization=0; authentication=0; [Post-Login] complete=7; The duration spent while attempting to connect to this server was - [Pre-Login] initialization=20; handshake=5; [Login] initialization=0; authentication=0; [Post-Login] complete=11003; ---> System.ComponentModel.Win32Exception: The wait operation timed out

I am using Entity Framework and my web site is hosted on an Azure Web App. I have done some digging and most SO questions I find about this are NOT related to Entity Framework but ADO.Net the few posts I found lead me updated from a Basic to Standard (S0) service for the DB and creating a GlobalDBConfig with

public class GlobalDBConfig : DbConfiguration
{
    public GlobalDBConfig()
    {
        SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy(2, TimeSpan.FromSeconds(30)));
    }
}

How can I figure out what else is going wrong and fix it? This is a very simple DB with simple queries and very little traffic to the site (less then 1000 visits a DAY)

Daniel A. White
  • 187,200
  • 47
  • 362
  • 445
Matthew Verstraete
  • 6,335
  • 22
  • 67
  • 123
  • 2
    Open an incident with Azure SQL DB – Remus Rusanu Aug 09 '15 at 15:31
  • I'm experiencing the same issue. – Augusto Barreto Aug 09 '15 at 17:28
  • 1
    @RemusRusanu I am hoping to try and exhaust all other options before I have to pay for support. I am unemployed so I don't really have the money for it. – Matthew Verstraete Aug 09 '15 at 18:39
  • Are you able to connect to the master / user database using SSMS / some other tools? If you can this must be some problem with the app / middle tier.. – Sirisha Chamarthi Aug 09 '15 at 19:42
  • @sirishachamarthi on my side, the issue is intermittent, and I also got it many times from SSMS. Something is not working well on the database engine side. – Augusto Barreto Aug 09 '15 at 19:47
  • @RemusRusanu How can I test if it is the app? SSMS is REALLY slow connecting and getting data but never times out or has login issues. If it is the Middle tier what can I do for mitigation? – Matthew Verstraete Aug 09 '15 at 19:57
  • Which data center your database is in? You can get it from doing nsloookup We can see if there is any issue going there and will request specific information if needed. – Satya_MSFT Aug 09 '15 at 20:26
  • @SatyanarayanaNarlapuram NSLookup says eastus1-a.control.database.windows.net (if I did it right) and I know I am in Default-SQL-EastUS – Matthew Verstraete Aug 09 '15 at 20:38
  • @Matthew, is this resolved now or are you still seeing this? – Satya_MSFT Aug 10 '15 at 16:26
  • @SatyanarayanaNarlapuram The last error I got was form 8-9-15 at 11:04 PM EST so it looks to be fixed. Looks like it was a network issue? Since I don't have paid tech support is there a better way to report these types of problems in the feature? Thanks – Matthew Verstraete Aug 10 '15 at 20:56
  • @SatyanarayanaNarlapuram I just got another set of them – Matthew Verstraete Aug 11 '15 at 01:13
  • This could be a backend issue. I have a couple work arounds for you to try, please see answers section as I couldn't type write here in a good format – Satya_MSFT Aug 11 '15 at 03:56
  • We're experiencing the same issue. Started around 2015-08-08 10:44 (UTC) and is still on-going. We experience multiple 40613 errors/hour. – michael.aird Aug 11 '15 at 14:23
  • Hi, was this issued fixed? I have it constantly. – JCS Feb 10 '16 at 16:28
  • For me the problem went away by it's self. While I still get them randomly I am in the process of rebuilding the app in .Net Core as well as moving the Database to a VM running SQL Server. I was once on a VM for SQL and NEVER had the problem so I am moving back in hopes that will permanently negate the issue. – Matthew Verstraete Feb 10 '16 at 16:30

3 Answers3

3

We resolved this issue, along with other types of random timeouts on SQL Azure by switching to "contained users". Using server-level logins on SQL Azure can cause issues:

This is not very efficient as in SQL DB master and user can sit on two different SQL servers potentially in two different machines. Also when a server has multiple user databases then master will be the bottleneck in the login process, and under load this may result in high response time for logins. If Microsoft is updating the software on the machine / server then master will be unavailable for a few seconds and all the logins to the user database can fail too at this time (http://www.sqlindepth.com/contained-users-in-sql-azure-db-v12/)

As in your case, I had my doubts because my database was not under heavy load, but switching to contained users made a tremendous difference anyway.

The SQL to create these users is as follows (run this on the database itself, not on the master database as you would for creating server-level logins):

Create user ContainedUser with password = 'Password'
ALTER AUTHORIZATION ON SCHEMA::[db_owner] TO [ContainedUser]
ALTER ROLE [db_owner] ADD MEMBER [ContainedUser]
Loren Paulsen
  • 8,960
  • 1
  • 28
  • 38
  • We also experienced DB connection issues with all DB metrics being <5%. Trying this now, and it seems to help, possibly performance is also somewhat better. – Piedone Jan 11 '18 at 14:07
  • After a few days I can say that yes, perf is marginally better (average page load time went down about 5%) but transient connection errors are more frequent. – Piedone Jan 12 '18 at 13:34
  • It's incredible but this indeed seems to solved the issue for good, after about two weeks I think I can say. Thank you Loren! – Piedone Jan 21 '18 at 12:43
0

Here are a few options to try: I strongly recommend going with (1) and (3) if possible

  1. User database firewall rules and contained user authentication
  2. Increase connection timeout to a large value (60-120 seconds?)
  3. If possible update your client drivers to latest version (7.4 and above)
Satya_MSFT
  • 1,024
  • 6
  • 10
  • Can you please expand? What am I doing to DB firewall rules and how since SQL Azure DBs have no firewall under "all settings". Update what client drivers? This is a Azure Web App (not VM) using Entity Framework. – Matthew Verstraete Aug 11 '15 at 15:39
  • Using contained users avoids round trip to the master database of the server which could be on a different SQL server running on a different machine. Please see the article below on how to minimize login time: http://www.sqlindepth.com/2015/07/minimize-login-time-in-sql-db-v12/ – Sirisha Chamarthi Aug 26 '15 at 07:33
0

We had similar problems and please note that there is no such thing as AUTO scaling on standalone databases on Azure and since you are using Entity Framework, here are some suggestions below

  1. If you are calling Web-API to fetch and transact with your database on Azure SQL, make sure on Azure portal you set the "ALLWAYS ON" option for the Web-API.

  2. Then your client app should probably retry if it fails to connect in the first attempt.

  3. If the database queries are resulting in timeouts due to volume of the data and the indexes not able to catch up with that, you will need to increase the time out of the command executions a bit and most importantly you will need to update the stats on the database and recompile all the objects in the database.

Kaushik Ghosh
  • 11,912
  • 1
  • 15
  • 9
  • Thanks for the post, hopefully it will help others but for me I am not using an API the site calls directly to the DB and I do have a retry plan as outlined in my OP. While I never pinpointed the problem the DB at the time was extremely small so I don't think it was a timeout of the query but that is something I will keep in mind if it happens again. – Matthew Verstraete May 16 '16 at 18:41