0

I am having an issue with SQL 2012 Availability Groups where it does not fail over when shutting down the service. When you shut down the service you will get an error:

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections.

It appears that the listener is still trying to listen off of the node that was just shutdown.

If I attempt to manually failover in SSMS, it fails over successfully.

I am not sure exactly what is the root cause of this issue since I was able to shutdown the service previously and it would failover. I thought the problem had been resolved.

I have tried to set the maximum failures in the specified period to 25 for every hour.

leeman24
  • 147
  • 1
  • 9
  • What version of windows are you running under the 2012 AG? They are based on Windows Clustering, so that's going to make a difference in the troubleshooting. – Zypher Jul 10 '15 at 01:16
  • Dumb question, but is it set for automatic failover? Check availability_mode and failure_condition_level in sys.availability_replicas and sys.availability_groups (respectively). – Ben Thul Jul 10 '15 at 03:30
  • I am running Windows Server 2008 R2 Enterprise. Yes, it is set to automatic failover. Failover condition level is set to 3. – leeman24 Jul 17 '15 at 02:06

2 Answers2

1

Feel free to add this to comments, but you need to provide more details - what OS are you on, how is your listener configured, how many nodes in your cluster, how many replicas, how is your quorum configured, are all nodes/replicas in the same submask??? I can throw out a handful of reasons you aren't auto failing

  • If you are on an OS earlier than 2012, you could be suffering from last man standing issues
  • If you have an even number of nodes/replicas, you need to configure an odd man for quorum
  • If you have nodes/replicas in different submasks, you need to make sure the listener is listening on both
  • Are your nodes running dual NiCs to include a heartbeat? If not you may need to add hardware and configure accordingly

And the list goes on. More info would be helpful.

  • I am on Windows Server 2008 R2 Enterprise. I am not sure what you mean by how is the listener configured, but I only have one setup with a static IP and DNS name on 1433. I currently have 3 nodes in my cluster. and two are set to automatic failure mode. The quorum file share is on the 3rd node as a file share (not sure if I answered that correctly). All nodes are on the same subnet. Could you expand on last man standing issues -- I guess stuff they never patched? The nodes are currently running one nic. – leeman24 Jul 17 '15 at 02:18
  • My failover test also only included shutting down the SQL server and not the system. – leeman24 Jul 17 '15 at 02:18
  • Windows Server 2008 R2 uses a static quorum model vs the dynamic model used in 2012. Read about it [here](http://blogs.technet.com/b/scottschnoll/archive/2014/02/25/database-availability-groups-and-windows-server-2012-r2.aspx), but this doesn't sound like your problem. You could have an issue with the lack of a secondary NIC and a [heartbeat network](http://blogs.technet.com/b/askcore/archive/2010/02/12/windows-server-2008-failover-clusters-networking-part-1.aspx). I've not seen a cluster build without a heartbeat network. – Steve Mangiameli Jul 17 '15 at 14:08
  • Though not your issue, yet, you need to [add and configure your listener](https://msdn.microsoft.com/en-us/library/hh213080.aspx). It's kind of the cornerstone for external connection to an AG. Additionally, find another place to host your quorum file share. Doesn't do any good if it's on a server in the cluster and that server goes down ;-) – Steve Mangiameli Jul 17 '15 at 14:10
  • Thanks. I will look into the heartbeat network. Hopefully this won't cause too many issues. Do you happen to have a full setup guide that is good? From what I read, it looks like I have configured my listener correctly. I have been using and testing with that address. My application is deployed using the listeners hostname. As far as the quorum file share goes, I initially had it on a standalone server but we wanted to test with 3 nodes, two being automatic and one as manual failover (which would be used for disaster recover on a different server/network). – leeman24 Jul 17 '15 at 17:34
  • Sorry, I do not. I don't have any personal experience actually setting up the heartbeat network, but any network admin worth his salt will be able to do it without much issue. The link I provided above should get you started if you have to do it yourself. – Steve Mangiameli Jul 17 '15 at 21:27
0

The resolution to the issue where my Availability Group would stay offline when shutting down one of the SQL Services on a node. I had no idea why this was happening as it was once working. I will just say it was human error.

What I needed to do: - Administrative Tools --> Failover Cluster Manager - --> Services and applications --> --> Other Resources --> --> Click Enable auto-start.

I must of accidentally clicked disable because I don't remember setting it. Now when I shutdown the service, it comes back up successfully as expected.

leeman24
  • 147
  • 1
  • 9