15

I'm currently getting errors when trying to exec or get logs for my pods on my GKE cluster.

$ kubectl logs <POD-NAME>
Error from server: Get "https://<NODE-PRIVATE-IP>:10250/containerLogs/default/<POD-NAME>/<DEPLOYMENT-NAME>": remote error: tls: internal error
$ kubectl exec -it <POD-NAME> -- sh
Error from server: error dialing backend: remote error: tls: internal error

One suspicious thing I found while troubleshooting is that all CSRs are getting denied...

$ kubectl get csr
NAME        AGE     SIGNERNAME                      REQUESTOR                 CONDITION
csr-79zkn   4m16s   kubernetes.io/kubelet-serving   system:node:<NODE-NAME>   Denied
csr-7b5sx   91m     kubernetes.io/kubelet-serving   system:node:<NODE-NAME>   Denied
csr-7fzjh   103m    kubernetes.io/kubelet-serving   system:node:<NODE-NAME>   Denied
csr-7gstl   19m     kubernetes.io/kubelet-serving   system:node:<NODE-NAME>   Denied
csr-7hrvm   11m     kubernetes.io/kubelet-serving   system:node:<NODE-NAME>   Denied
csr-7mn6h   87m     kubernetes.io/kubelet-serving   system:node:<NODE-NAME>   Denied
csr-7nd7h   4m57s   kubernetes.io/kubelet-serving   system:node:<NODE-NAME>   Denied
...

Any idea why this is happening ? Maybe a firewall issue ?

Thanks in advance !

Update 1

Here the same commands with a verbose output --v=8, withtout the goroutines stack trace

$ kubectl logs --v=8 <POD-NAME>

I0527 09:27:59.624843   10407 loader.go:375] Config loaded from file:  /home/kevin/.kube/config
I0527 09:27:59.628621   10407 round_trippers.go:420] GET https://<PUBLIC-IP>/api/v1/namespaces/default/pods/<POD-NAME>
I0527 09:27:59.628635   10407 round_trippers.go:427] Request Headers:
I0527 09:27:59.628644   10407 round_trippers.go:431]     Accept: application/json, */*
I0527 09:27:59.628649   10407 round_trippers.go:431]     User-Agent: kubectl/v1.19.3 (linux/amd64) kubernetes/1e11e4a
I0527 09:27:59.727411   10407 round_trippers.go:446] Response Status: 200 OK in 98 milliseconds
I0527 09:27:59.727461   10407 round_trippers.go:449] Response Headers:
I0527 09:27:59.727480   10407 round_trippers.go:452]     Audit-Id: ...
I0527 09:27:59.727496   10407 round_trippers.go:452]     Cache-Control: no-cache, private
I0527 09:27:59.727512   10407 round_trippers.go:452]     Content-Type: application/json
I0527 09:27:59.727528   10407 round_trippers.go:452]     Date: Thu, 27 May 2021 07:27:59 GMT
I0527 09:27:59.727756   10407 request.go:1097] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"<POD-NAME>","generateName":"<POD-BASE-NAME>","namespace":"default","selfLink":"/api/v1/namespaces/default/pods/<POD-NAME>","uid":"...","resourceVersion":"6764210","creationTimestamp":"2021-05-19T10:33:28Z","labels":{"app":"<NAME>","pod-template-hash":"..."},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"<POD-BASE-NAME>","uid":"...","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2021-05-19T10:33:28Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:app":{},"f:pod-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"...\"}":{".":{},"f:apiVersion":{},"f:blockOwnerDeletion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}},"f:spec":{"f:c [truncated 3250 chars]
I0527 09:27:59.745985   10407 round_trippers.go:420] GET https://<PUBLIC-IP>/api/v1/namespaces/default/pods/<POD-NAME>/log
I0527 09:27:59.746035   10407 round_trippers.go:427] Request Headers:
I0527 09:27:59.746055   10407 round_trippers.go:431]     Accept: application/json, */*
I0527 09:27:59.746071   10407 round_trippers.go:431]     User-Agent: kubectl/v1.19.3 (linux/amd64) kubernetes/1e11e4a
I0527 09:27:59.800586   10407 round_trippers.go:446] Response Status: 500 Internal Server Error in 54 milliseconds
I0527 09:27:59.800638   10407 round_trippers.go:449] Response Headers:
I0527 09:27:59.800654   10407 round_trippers.go:452]     Audit-Id: ...
I0527 09:27:59.800668   10407 round_trippers.go:452]     Cache-Control: no-cache, private
I0527 09:27:59.800680   10407 round_trippers.go:452]     Content-Type: application/json
I0527 09:27:59.800693   10407 round_trippers.go:452]     Content-Length: 217
I0527 09:27:59.800712   10407 round_trippers.go:452]     Date: Thu, 27 May 2021 07:27:59 GMT
I0527 09:27:59.800772   10407 request.go:1097] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get \"https://10.156.0.8:10250/containerLogs/default/<POD-NAME>/<SERVICE-NAME>\": remote error: tls: internal error","code":500}
I0527 09:27:59.801848   10407 helpers.go:216] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "Get \"https://10.156.0.8:10250/containerLogs/default/<POD-NAME>/<SERVICE-NAME>\": remote error: tls: internal error",
  "code": 500
}]
F0527 09:27:59.801944   10407 helpers.go:115] Error from server: Get "https://10.156.0.8:10250/containerLogs/default/<POD-NAME>/<SERVICE-NAME>": remote error: tls: internal error

kubectl exec --v=8 -it <POD-NAME> -- sh

I0527 09:44:48.673774   11157 loader.go:375] Config loaded from file:  /home/kevin/.kube/config
I0527 09:44:48.678514   11157 round_trippers.go:420] GET https://<PUBLIC-IP>/api/v1/namespaces/default/pods/<POD-NAME>
I0527 09:44:48.678528   11157 round_trippers.go:427] Request Headers:
I0527 09:44:48.678535   11157 round_trippers.go:431]     Accept: application/json, */*
I0527 09:44:48.678543   11157 round_trippers.go:431]     User-Agent: kubectl/v1.19.3 (linux/amd64) kubernetes/1e11e4a
I0527 09:44:48.795864   11157 round_trippers.go:446] Response Status: 200 OK in 117 milliseconds
I0527 09:44:48.795920   11157 round_trippers.go:449] Response Headers:
I0527 09:44:48.795963   11157 round_trippers.go:452]     Audit-Id: ...
I0527 09:44:48.795995   11157 round_trippers.go:452]     Cache-Control: no-cache, private
I0527 09:44:48.796019   11157 round_trippers.go:452]     Content-Type: application/json
I0527 09:44:48.796037   11157 round_trippers.go:452]     Date: Thu, 27 May 2021 07:44:48 GMT
I0527 09:44:48.796644   11157 request.go:1097] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"<POD-NAME>","generateName":"","namespace":"default","selfLink":"/api/v1/namespaces/default/pods/<POD-NAME>","uid":"","resourceVersion":"6764210","creationTimestamp":"2021-05-19T10:33:28Z","labels":{"app":"...","pod-template-hash":"..."},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"<POD-BASE-NAME>","uid":"...","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2021-05-19T10:33:28Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:app":{},"f:pod-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"...\"}":{".":{},"f:apiVersion":{},"f:blockOwnerDeletion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}},"f:spec":{"f:c [truncated 3250 chars]
I0527 09:44:48.814315   11157 round_trippers.go:420] POST https://<PUBLIC-IP>/api/v1/namespaces/default/pods/<POD-NAME>/exec?command=sh&container=<SERVICE-NAME>&stdin=true&stdout=true&tty=true
I0527 09:44:48.814372   11157 round_trippers.go:427] Request Headers:
I0527 09:44:48.814391   11157 round_trippers.go:431]     User-Agent: kubectl/v1.19.3 (linux/amd64) kubernetes/1e11e4a
I0527 09:44:48.814406   11157 round_trippers.go:431]     X-Stream-Protocol-Version: v4.channel.k8s.io
I0527 09:44:48.814420   11157 round_trippers.go:431]     X-Stream-Protocol-Version: v3.channel.k8s.io
I0527 09:44:48.814445   11157 round_trippers.go:431]     X-Stream-Protocol-Version: v2.channel.k8s.io
I0527 09:44:48.814471   11157 round_trippers.go:431]     X-Stream-Protocol-Version: channel.k8s.io
I0527 09:44:48.913928   11157 round_trippers.go:446] Response Status: 500 Internal Server Error in 99 milliseconds
I0527 09:44:48.913977   11157 round_trippers.go:449] Response Headers:
I0527 09:44:48.914005   11157 round_trippers.go:452]     Audit-Id: ...
I0527 09:44:48.914029   11157 round_trippers.go:452]     Cache-Control: no-cache, private
I0527 09:44:48.914054   11157 round_trippers.go:452]     Content-Type: application/json
I0527 09:44:48.914077   11157 round_trippers.go:452]     Date: Thu, 27 May 2021 07:44:48 GMT
I0527 09:44:48.914099   11157 round_trippers.go:452]     Content-Length: 149
I0527 09:44:48.915741   11157 helpers.go:216] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "error dialing backend: remote error: tls: internal error",
  "code": 500
}]
F0527 09:44:48.915837   11157 helpers.go:115] Error from server: error dialing backend: remote error: tls: internal error

Update 2

After connecting to one of the GKE worker nodes and checking kubelet logs I found these wired lines

May 27 09:30:11 gke-<CLUSTER-NAME>-default-pool-<NODE-UID> kubelet[1272]: I0527 09:30:11.271022    1272 log.go:181] http: TLS handshake error from 10.156.0.9:54672: no serving certificate available for the kubelet
May 27 09:30:11 gke-<CLUSTER-NAME>-default-pool-<NODE-UID> kubelet[1272]: I0527 09:30:11.305628    1272 log.go:181] http: TLS handshake error from 10.156.0.9:54674: no serving certificate available for the kubelet
May 27 09:30:12 gke-<CLUSTER-NAME>-default-pool-<NODE-UID> kubelet[1272]: I0527 09:30:12.067998    1272 log.go:181] http: TLS handshake error from 10.156.0.11:57610: no serving certificate available for the kubelet
May 27 09:30:14 gke-<CLUSTER-NAME>-default-pool-<NODE-UID> kubelet[1272]: I0527 09:30:14.144826    1272 certificate_manager.go:412] Rotating certificates
May 27 09:30:14 gke-<CLUSTER-NAME>-default-pool-<NODE-UID> kubelet[1272]: I0527 09:30:14.154322    1272 reflector.go:207] Starting reflector *v1.CertificateSigningRequest (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
May 27 09:30:14 gke-<CLUSTER-NAME>-default-pool-<NODE-UID> kubelet[1272]: I0527 09:30:14.448976    1272 reflector.go:213] Stopping reflector *v1.CertificateSigningRequest (0s) from k8s.io/client-go/tools/watch/informerwatcher.go:146
May 27 09:30:14 gke-<CLUSTER-NAME>-default-pool-<NODE-UID> kubelet[1272]: E0527 09:30:14.449045    1272 certificate_manager.go:454] certificate request was not signed: cannot watch on the certificate signing request: certificate signing request is denied, reason: AutoDenied, message:

Update 3

I've updated the cluster version from 1.19.9-gke.1400 to 1.19.9-gke.1900. Didn't solved the problem...

Made a Credentials Rotation on the cluster. But didn't solved as well...

Final

After trying lot of changes in the cluster :

  • Restarting kubelet on nodes
  • Restarting nodes
  • Upscaling/Downscaling node pool size
  • Upgrading cluster version
  • Rotating cluster certificates

Even creating a new cluster (on the same project, with same VPC, etc) didn't solve the issue...

This problem might be related to changes made on Firewall rules.

Only solution found, creating a new GKE cluster in a new GCP project and migrating workflow using Velero.

  • this answer might help you: https://stackoverflow.com/questions/51154911/kubectl-exec-results-in-error-unable-to-upgrade-connection-pod-does-not-exi – Ivonet May 20 '21 at 13:44
  • same issue with me... – Hamza Mirchi May 20 '21 at 15:56
  • @Kevin can you post the output of the commands you shared but adding the `--v=8` parameter? (please redact any personal information from the output) – Vi Pau May 21 '21 at 10:33
  • @AlbertoPau verbose output added to initial question –  May 27 '21 at 07:52
  • I found a couple issues related to this that have been fixed in the past, can you confirm you upgraded your cluster to the latest GKE version available in your release channel? Please also mention your GKE version. – Vi Pau May 27 '21 at 11:41
  • The cluster is running version `1.19.9-gke.1400` on release channel `Regular` –  May 27 '21 at 12:37
  • I found an issue related to this which concerns Windows images, are you using Windows in GKE? – Vi Pau May 28 '21 at 14:21
  • @AlbertoPau No Windows images used at all on the cluster. Didn't found the real problem, started a new cluster in a new project was the solution... Thanks for your help –  Jun 03 '21 at 07:37
  • Probably a firewall issue then in your project's VPC. I'm glad you solved your issue! – Vi Pau Jun 03 '21 at 08:06

3 Answers3

7

Assume this answer may help ,

*This issue is due to pending Certificate Sigining Requests for nodes made by kubelet running in each node *

to check the csr pending for nodes

kubectl get csr  --sort-by=.metadata.creationTimestamp

then approve the csr for each node

kubectl certificate  approve  <csr-id>

Further details ,refer this section in k8s documentation

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#kubelet-serving-certs

One known limitation is that the CSRs (Certificate Signing Requests) for these certificates cannot be automatically approved by the default signer in the kube-controller-manager - kubernetes.io/kubelet-serving. This will require action from the user or a third party controller

.

1

Please approve your certificate. for kuberenets/openshift. The solution for openshift to approve certificate

1- Check csr and check if condition is pending

$ oc get csr

2- approve certificate

$ oc get csr -o name | xargs oc adm certificate approve
majid asad
  • 303
  • 5
  • 14
-1

This is Known issue and the issue causes when the cluster get upgraded but error has been fixed in some versions.The fixed GKE versions are:

1.19.11-gke.1700+
1.20.7-gke.1800+
1.21.1-gke.1800+

So the recommended workaround is creating a new cluster in a new project.

Bakul Mitra
  • 432
  • 2
  • 7