Why does kubernetes produce multiple errors when CPU usage is high?

Question

I'm using Kubernetes with GKE (one node), which is very nice to use. However, I'm experiencing multiple errors, making all the pods not responding :

kubectl exec command : Error from server: error dialing backend: ssh: rejected: connect failed (Connection refused)
logs from nginx-ingress controller : service staging/myservice does not have any active endpoints
kubectl top nodes : Error from server (InternalError): an error on the server ("unknown") has prevented the request from succeeding (get services http:heapster:)

It happens when CPU usage is high (100% or almost, due to parallel Jenkins builds in my case).

I do set some resource requests and limits (sometimes both) for few pods, but even those pods are not reachable and at one point, they restart. The reason is almost always "Completed", with exit code 0 and few times "Error" with different exit codes (2, 137, 255 for example).

I've also noticed this error from replication controllers : Error syncing pod, skipping: network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]

Kubernetes normally allows to keep availability of services over the cluster.

How can we explain this behavior ? What's the recommended way to prevent it ?

If CPU requests are less that limits, then the overall CPU on the node may still be overcommitted. Can you try setting just CPU limits for your pods? I'd also recommend aiming for 70% utilization since the Kubernetes components running on the node need some CPU as well. — Vishnu Kannan, Aug 07 '17 at 23:06

Why does kubernetes produce multiple errors when CPU usage is high?

0 Answers0