10

I'm setting up DB auto failover in C# with SQL Server 2008 and I have a 'high safety with automatic failover mirror' using a witness setup and my connection string looks like

"Server=tcp:DC01; Failover Partner=tcp:DC02; database=dbname; uid=sewebsite;pwd=somerndpwd;Connect Timeout=10;Pooling=True;"

During testing, when I turn off the SQL Server service on the principal server the auto failover works like a charm, but if I take the principal server offline (by shutting down the server or killing the network card) auto failover does not work and my website just times out.

I found this article where the second last post suggests that its because we are using named pipes which does not work when the principal goes offline, but we force TCP in our connection string.

What am I missing to get this DB auto failover working?

Ben Pilbrow
  • 12,041
  • 5
  • 36
  • 57
user62521
  • 121
  • 2
  • 4

2 Answers2

7

After working with MS for a week, we've worked out why this happens.

Essentially, the application is not failing over because it needs to be sure that the database has failed over - and the sql connection is timing out before the connection has determined that the db has failed over.

The process to confirm that the database has failed over (with all the default tcp registry settings) is to:

  1. try to communicate with the principal, see that it is not the principal anymore
  2. communicate with the failover to make sure that it has failed over and that it is now the new principal.

When the principal is down, this communication takes about 21 seconds because it will:

  1. try to communicate with the principal, wait 3 seconds, timeout
  2. try to communicate with the principal again, wait 6 seconds, timeout
  3. try to communicate with the principal again, wait 12 seconds, timeout
  4. try to communicate with the failover partner, see that it has failed over, so fail over in the application.

So if your sql connection isn't waiting 21 seconds (probably more in reality) then its going to timeout before it finishes this dance and its not going to fail over at all.

Solution is to set the timeout in your connection string to large value, we use 60 seconds just to be safe.

Cheers

Trev
  • 111
  • 1
  • 5
0

I’m wondering if the conditions of automatic failover are not being met at the time of your tests? Specifically - if the database is not synchronized with the mirror (check mirroring state from sys.database_mirroring) at the time of failure AND/OR if the witness and mirror are not connected at that time (test via pings between participating roles).

You could also have a situation where your Partner and Mirror are not connected to each other - but the partner and mirror databases are still connected to the witness independently. In that case the witness sees nothing wrong (and hence no failover). But you mentioned you shut down the server itself, so this sounds less likely.

Or are you saying that the failover eventually happens but your re-connection fails? In this case, detection and failover time vary depending on how the principal failed and total time to recover the mirror db.

Joe Sack
  • 321
  • 1
  • 5