0

We are experiencing some issues with our GKE cluster. Here are the error messages we encountered:

  1. When running the command kubectl logs -f pod_name, we received the following error: "Error from server: Get 'https://x.x.x.x:10250/containerLogs/default/xxx': tunnel closed."

  2. Similarly, when trying to execute a command inside the pod using kubectl exec -it pod_name -- /bin/bash, we encountered the error: "Error from server: error dialing backend: tunnel closed."

Although all nodes appear to be healthy and the kubelet is running, we noticed some errors related to the Google Metrics Agent and Autoscaler Agent:

  • In the Prometheus discovery module (github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:469), we encountered the error: "Failed to watch *v1.Pod: failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods)."

  • Additionally, in the node collector (collectors/node.go:159), we received the error: "Failed to query API server for node data. Kind: receiver, Name: kubenode, Error: Get 'https://x.x.x.x:443/api/v1/nodes/gke-xxx-xx-pool-xxx?timeout=4.5s': net/http: request canceled (Client.Timeout exceeded while awaiting headers)."

  • The autoscaler is also encountering an issue: "Error while getting cluster status: timed out waiting for the condition."

Furthermore, in the control plane logs from the Google Cloud Console, we observed the message: "Too Many Requests" with the following details: resourceName: "apiextensions.k8s.io/v1/customresourcedefinitions."

We are also unable to schedule new pods. Even when attempting to deploy with Helm, the deployment remains stuck at 0/1.

We kindly request any assistance you can provide in resolving these issues.

Thank you.

0 Answers0