First, we have a Windows 2008 R2 Two Node cluster running HA Hyper-V and DHCP. We utilize a back-end Dell MD3000i iSCSI SAN for storage. All of the networking is done via redundant switches and MPIO drivers. The data network is on a different VLAN than the primary network.
Here is the scenario we keep encountering:
We have power outages sometimes. We have dual UPS devices in the cabinet and they last for about 15 minutes or so, but if we don't get power back everything goes down, cluster nodes, SAN and all.
Eventually the power comes back up, all of the devices are configured to boot when AC returns. However, when we have a complete outage like this the cluster never comes back online properly. We get the usual errors like the Quorum disk is unavailable, etc. In addition our two primary domain controllers are virtual machines on top of the VM Cluster. We do have a physical server running as another domain controller thinking this would help when things come back online.
What we are not understanding is why the system is not able to recover itself when it boots, there is an available DC for authentication, eventually. The iSCSI network comes back online, is there something else we are missing?
I think it may be related to the iSCSI Initiator service not starting quickly enough when the cluster service is ready to go.
Any ideas or things I can post to help?
Thanks, Brent