0

I have a Kubernetes cluster with 2 Linux nodes and 2 Windows Server 2019 running Kubernetes v1.26.0 and Containerd v1.6.20 also Vmware tools version 10.3.2

The Linux nodes are running just fine, but Windows node are misbehaving.

This is a new cluster, but Windows node can run for just <4hrs before Pods from Windows started to complain over missing endpoints.

I have checked the Windows Node and found that at the time of the event, there was an error 0xc0000005 on Windows log.

I can recover from this by, restarting HNS, contained and kubelet. Then performed redeployment but the issue after several hours it reoccurs.

From cluster am using Calico Network and I have enabled Hyper-V on Windows Server. I have also disabled Windows Update.

How can I stop this HNS autorestart.Error Image

Kafiti
  • 1
  • 1

2 Answers2

0

Error 0xc0000005 is an "Access Violation" within svchost, the host network filter driver in this case. A application in your stack (most likely calico, but without more details this is just a better guess) misbehaves and tries to access protected memory.

bjoster
  • 4,805
  • 5
  • 25
  • 33
0

Resolved the issue by disabling a Control Flow Guard(CFG) for the specific programs svchost, vmcompute and vmwp from Windows Exploit protection settings.

Also, an upgrade of Vmware tools to version 11.3+ is recommended.

Kafiti
  • 1
  • 1