0

We have a Supermicro Blade with 2 Node. The 2 node are identical, the exact same hardware.

We are using those Node to cluster Hyper-V Server. They are build with Windows Server 2012 R2.

The first node run really well, all of our VM can run on it without any problem. The second node is getting us totally crazy. When we power it, without load (I mean running Windows only), the node is fine and run for days without error. But when we put a load on it (aka a VM, yeah even a single VM), the system hang after about 2 hours. By hang I mean there is only a black screen, like it were on sleep. cant wake it with the mouse or keyboard. The system wont respond to ping any more. The only way to get back in windows is to reboot it. We didn't find any log in Windows about the freeze, nor on the Blade interface. All of the temperature sensor look fine.

So, we tried many thing. First we reinstalled Windows, Hyper-V... (many time) We swapped the processor from the first node to the second node. The second node still do the same thing. We swapped the memory from the first node to the second node. The second node still hang.

We changed the HardDrive (satadom). We remove every other hardware (two external NIC). Changed node placing in the blade. Changed, bios, IPMi, firmware...

Still the same thing.

We asked for a board replacement from Supermicro.

And!!!! We still have that hang, only on the second node of the blade, only when we put a VM in Hyper-V. The only thing we did not change is the enclosure.

Our first node can run about 30 VM without any issues, be the second node cant run one without hanging. Is anyone have an idea that could help us? (Would be great if it wont implies putting second node on fire, my boss don't like the idea)

Hardware spec: Enclosure Supermicro SBE-710Q Node: B9DRG (SBI-7127-RG)

Thanks

mr.sedam
  • 3
  • 1
  • Try to eliminate hyperv as an issue by running memtest and benchmarking tools on it for awhile - it may crash under high load even without and VMs – Grant Sep 11 '15 at 14:48
  • Good idea, but we already try memtest and CPU benchmark and we never been able to crash the node that way. – mr.sedam Sep 11 '15 at 15:56

0 Answers0