3

We are using Istio 1.8.1 and have started using a headless service to get direct pod to pod communication working with Istio mTLS. This is all working fine, but we have recently noticed that sometimes after killing one of our pods we get 503 no healthy upstream errors for a very long time afterwards (many minutes). If we go back to a ‘normal’ service we get a few 503 errors and then the problem is fixed very quickly (but we can't direct requests to a specific pod which we need to do).

We have traced the communications of the envoy container using kubectl sniff and can see that existing connections are maintained for a long period after the pod is killed, and even that new connections are attempted to the previously killed pod IP.

We have circuit breaker configuration on a destination rule for the service in question, and that doesn’t seem to have helped either. We have also tried setting ‘PILOT_ENABLE_EDS_FOR_HEADLESS_SERVICES’ which seemed to improve the 503 errors situation, but strangely interfered with pod to pod direct IP configuration.

Does anyone have any suggestions on why we were receiving the 503 errors or how to avoid them?

  • Did you check if these headless services works correctly without istio? From what I now Istio by default is incompatible with Headless Services. There is a [github issue](https://github.com/istio/istio/issues/7495) about that. There is [another one](https://github.com/istio/istio/issues/12551) with headless services issue with mTLS. Also there is a thread about that on [stackrox](https://www.stackrox.com/post/2019/11/how-to-make-istio-work-with-your-apps/), search for "Headless Services". – Jakub Jan 11 '21 at 13:44
  • Thank @Jakub, I am not even sure how I would go about finding out if they are supported or not. I have tried raising a github issue and a discussion forum post but received no replies. Today I am trying the slack channel. – Ryan Harley Jan 11 '21 at 22:36

0 Answers0