1

Currently, I am facing an issue with my application: it does not become healthy due to the kubelet not being able to perform a successful health check.

From pod describe:

  Warning  Unhealthy       84s                kubelet            Startup probe failed: Get "http://10.128.0.208:7777/healthz/start": dial tcp 10.128.0.208:7777: connect: connection refused
  Warning  Unhealthy       68s (x3 over 78s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500

Now, I find this strange as I can run the health check fine from the worker node where the kubelet is running? So I am wondering what is the difference between running the health check from the worker node via curl, or the kubelet doing it.

Example:

From worker node where the kubelet is running:
sh-4.4# curl -v http://10.128.0.208:7777/healthz/readiness
*   Trying 10.128.0.208...
* TCP_NODELAY set
* Connected to 10.128.0.208 (10.128.0.208) port 7777 (#0)
> GET /healthz/readiness HTTP/1.1
> Host: 10.128.0.208:7777
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 0
< Connection: close
< 
* Closing connection 0
sh-4.4#

Can I somehow trace when the kubelet is sending the health probe check? Or maybe get into the kubelet and send it myself from there?

There is an extra thing to be told: my pod has got an istio-proxy container inside. It looks like the traffic from the kubelet gets blocked by this istio-proxy.

Setting the following annotation in my deployement:

 "rewriteAppHTTPProbe": true

does not help for the kubelet. It did help to get a 200 OK when running the curl command from the worker node.

Maybe also to note: we are using the istio-cni plugin to ingest the istio sidecar. Not sure whether that makes a difference when using the former approach when injecting using istio-init ...

Any suggestions are welcome :). Thanks.

opstalj
  • 892
  • 4
  • 13
  • 22
  • How do you define your liveness probe? You must have specified the wrong path, as the logs show it's trying to access to `/healthz/start` instead of `/healthz/readiness`. Do share your k8s manifest file. – Huu Phuong Vu Mar 14 '22 at 09:45
  • Hi, actually I have got 3x probes: startup, readiness and liveness ... maybe the title of my request is not so well defined :( ... in any case: all 3x types of probes are failing. Note: from the worker node: curl -v http://10.128.0.208:7777/healthz/start -> gives a 200 OK fine. – opstalj Mar 14 '22 at 09:49
  • As per istio docs, the health check requests to the liveness-http service are sent by Kubelet. This becomes a problem when mutual TLS is enabled, because the Kubelet does not have an Istio issued certificate. Therefore the health check requests will fail. To get around this, you can change `rewriteAppHTTPProbe` to `false`. – Huu Phuong Vu Mar 14 '22 at 09:58
  • refer to https://istio.io/latest/docs/ops/configuration/mesh/app-health-check/#disable-the-http-probe-rewrite-for-a-pod – Huu Phuong Vu Mar 14 '22 at 09:58
  • hi Huu, thanks for your help ... but I think the flag should be set to 'true', no? If set to "false", then the health check from the worker node fails too. If I set it to "true", then at least I can get a 200 OK from the worker node... – opstalj Mar 14 '22 at 10:19
  • PLase update your question with your yaml deployment file (remove all sensitive information) so we can troubleshoot this further. – Wojtek_B Mar 14 '22 at 19:29

1 Answers1

1

Issue looks to be that the istio-cni plugin changes the iptables and a re-direct of the health probe check happens towards the application. However, the re-direct happens to the localhost: and the application is not listening there for the health probes ... only on the : ...

After changing the iptables to a more proper re-direct, the health probe could get responded fine with a 200 OK and the pod became healthy.

opstalj
  • 892
  • 4
  • 13
  • 22