Windows Node HNS Network Restart Error 0xc0000005

Question

I have a Kubernetes cluster with 2 Linux nodes and 2 Windows Server 2019 running Kubernetes v1.26.0 and Containerd v1.6.20 also Vmware tools version 10.3.2

The Linux nodes are running just fine, but Windows node are misbehaving.

This is a new cluster, but Windows node can run for just <4hrs before Pods from Windows started to complain over missing endpoints.

I have checked the Windows Node and found that at the time of the event, there was an error 0xc0000005 on Windows log.

I can recover from this by, restarting HNS, contained and kubelet. Then performed redeployment but the issue after several hours it reoccurs.

From cluster am using Calico Network and I have enabled Hyper-V on Windows Server. I have also disabled Windows Update.

How can I stop this HNS autorestart.Error Image

score 0 · Answer 1 · answered Aug 23 '23 at 13:13

Error 0xc0000005 is an "Access Violation" within svchost, the host network filter driver in this case. A application in your stack (most likely calico, but without more details this is just a better guess) misbehaves and tries to access protected memory.

score 0 · Answer 2 · answered Aug 31 '23 at 06:16

0

Resolved the issue by disabling a Control Flow Guard(CFG) for the specific programs svchost, vmcompute and vmwp from Windows Exploit protection settings.

Also, an upgrade of Vmware tools to version 11.3+ is recommended.

answered Aug 31 '23 at 06:16

Kafiti

1
1

Windows Node HNS Network Restart Error 0xc0000005

2 Answers2