4

I have an interesting problem. I think I've found an infinite request loop that's causing my istio-proxy to crash with an OOM error in a specific circumstance.

When I submit the request locally to the app directly from inside the application container it seems to work fine and in the istio-proxy logs I see that the upstream_cluster is PassthroughCluster

However when I submit the request through the envoy proxy I'm using for SSL termination (not the istio-proxy for the app/mesh) the request seems to be retried/repeated until the istio-proxy crashes with an OOM; the only upstream_cluster I see in those logs is inbound|80|myapp

Has anyone see something like this before? It seems like the request is being routed to the wrong listener or the envoy-proxy we have sitting in front of the app that's in the mesh is somehow messing up the request.


A few more details about the symptom:

I have just one app in the service mesh and the traffic coming to that app comes from an envoy proxy that's handling SSL termination

Envoy -> (istio-proxy / app ) -> other services

When I send the request directly to localhost from within app it works fine, when I port forward directly to the (istio-proxy / app) pod and issue the request it also appears to work fine. When I send the request through Envoy I start to see thousands of identical requests in the (istio-proxy / app) logs and eventually the istio-proxy crashes with OOM.

blankenshipz
  • 365
  • 2
  • 10
  • Could you try to enable trace logs with `kubectl exec -it -c istio-proxy -- curl -X POST http://localhost:15000/logging?level=trace` and check istio-proxy logs with `kubectl logs -c isitio-proxy -f` and provide some logs? If I understand correctly you have ingress-gateway-> envoy -> app with envoy sidecar, why won't you do tls termination on gateway? If you use grpc take a look at this [github issue](https://github.com/envoyproxy/envoy/issues/8857). – Jakub Aug 27 '20 at 14:49
  • @Jakub I'd rather not have to change how I'm doing SSL if possible but I'll think about that; as far as the logs go they're very verbose and the only thing that stood out to me was that when I don't go Envoy -> (istio-proxy / app) the `upstream_cluster` is `PassthroughCluster` instead of `inbound|80|myapp`. Is there something more specific I should look for in the logs? – blankenshipz Aug 27 '20 at 18:39
  • @blankenshipz I think this is really hard to answer without your envoy configs. e.g. the output of the `/config_dump` endpoint from the admin server. That would help us understand what the change in cluster truly means. That being said, I have seen something like this before, and it was caused by circular logic in the routing rules; so a route that does a prefix_rewrite or a redirect that causes the request to loop back through all the rules and just continue looping forever. Still could help more with the configs, but that's where I'd look first. – justincely Sep 03 '20 at 18:34
  • Any luck here, I struck in the same issue? – Vaibhav Jain Oct 18 '20 at 19:33
  • 1
    @VaibhavJain Unfortunatly not; eventually I was able to open an issue with Istio but there has been no response - I moved to using an Istio IngressGateway to avoid the Envoy proxy. Here's that issue https://github.com/istio/istio/issues/26946 – blankenshipz Oct 19 '20 at 20:06

1 Answers1

1

It turns out that this wasn't actually an issue with Istio or Envoy;

My application was acting as a reverse proxy and forwarding inbound requests.

When the request came in through Envoy it changed the HOST header of the request to match the k8s service.

When our service received the request it forwarded it with the host header intact and then the istio-sidecar tried to use this header; this is what caused the issue.

Updating or removing the host header in the reverse proxy app fixes the issue.

This github issue has more details

https://github.com/istio/istio/issues/26946#issuecomment-715365117

blankenshipz
  • 365
  • 2
  • 10
  • Thanks for sharing the findings. This is helpful. A general note for all if you are using reverse proxy with istio, remove the host header. – Vaibhav Jain Oct 23 '20 at 16:37