1

I'm trying to set up a SQL Server 2012 SP1 "AlwaysOn Failover Cluster Instance" on top of a Windows Server 2012 R2 cluster (cl01) consisting of 2x 2012 R2 nodes (sql1 and sql2). Hosted on VMWare ESXi v5.5 with 3x VMXNet3 adapters each (LAN, iSCSI and Private-Heartbeat).

The Windows servers have Failover Clustering enabled, iSCSI LUNs mapped and Quorum assigned, created a MSDTC clustered role and verified the cluster with the wizard (no warnings), all seems good.

I install the primary (sql1) SQL 2012 SP1 node with database engine, reporting services and analysis services - all set up with their own respective network accounts as per best practise.

I then go to add the second Windows server (sql2) "Add node to existing SQL cluster" - and somewhere during the install of this node to the cluster service the primary SQL node (sql1) always BSODs: IRQL_NOT_LESS_OR_EQUAL (tcpip.sys). It then continues to BSOD on every reboot and ends up in a boot-loop.

So I figured it must have been the OS on sql1 so I uninstalled all components on both servers and instead set up sql2 as primary and sql1 as secondary - this time sql2 (now primary) goes down with the exact same BSOD and displays the same behaviour as the last scenario.

  • Is SQL clustering broken in 2012 SP1 or Server 2012 R2?
  • Does it have anything to do with the machines using VMXNet3 adapters (I have tried E1000E with the same results) - network interruptions causing "purposeful" BSOD as cluster eviction?
  • Clearly this is network related (tcpip.sys) event viewer shows nothing untoward however.
Myles Gray
  • 659
  • 4
  • 12
  • 33
  • Well, as a comment one thing it is NOT is SQL Server related. As in: SQL Server is totally a user level application and thus CAN NOT cause a BSOD. Not saying it does not trigger it, but the core problem HAS to be in the kernel - likely a driver issue, as you have identified. – TomTom Jan 15 '14 at 15:17
  • Update: from event log: `The computer has rebooted from a bugcheck. The bugcheck was: 0x000000d1 (0x0000000000000000, 0x0000000000000002, 0x0000000000000001, 0xfffff800010d1e98). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 011514-25765-01.` – Myles Gray Jan 15 '14 at 17:01
  • PSS. Microsoft Support Services. They will want a copy of the dump. I Would check drivers first though. And disable hardware acceleration on the NIC for a test. This really looks like a driver issue. Open a VmWare support ticket. But absolutely make sure things are patched - that gets embarassing otherwise ;) – TomTom Jan 15 '14 at 17:57

0 Answers0