We have an Istio deployment that has been running successfully for some time now, however when the Ingress Gateway pods are restarted, we get intermittent/random 504s along with successful traffic. I have a feeling its due to Keepalive connections between the ELB and the IGW pods, as there are no 504 logs in gateway, and the LB metrics indicate "ELB 5xx" errors not "HTTP 5xxs." We're on Istio 1.12.6
I've tried configuring the terminationDrainDuration
to longer than the Idle Timeout of the LB, but no success (though I do see the pods take the appropriate amount of time to shut down). We can replicate this by sending traffic to the app, and restarting the IGW pods via kubectl rollout restart
command.
Config:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: my-iop
namespace: my-ns
spec:
meshConfig:
defaultConfig:
# For some reason this has no effect on the IGW pods?
terminationDrainDuration: 70s
holdApplicationUntilProxyStarts: true
proxyMetadata:
ISTIO_META_EXIT_ON_ZERO_ACTIVE_CONNECTIONS: "true"
components:
cni:
enabled: true
pilot:
enabled: true
ingressGateways:
- enabled: true
k8s:
overlays:
- kind: Deployment
name: my-ingressgateway
patches:
- path: spec.template.spec.terminationGracePeriodSeconds
value: 120
podAnnotations:
# setting this does actually affect the IGW pods, the logs print the configuration correctly
proxy.istio.io/config: |
terminationDrainDuration: 70s
proxyMetadata:
ISTIO_META_EXIT_ON_ZERO_ACTIVE_CONNECTIONS: "true"
hpaSpec:
minReplicas: 5
maxReplicas: 10
serviceAnnotations:
service.beta.kubernetes.io/aws-load-backend-protocol: https
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: TCP
service.beta.kubernetes.io/aws-load-balancer-healthcheck-target: TCP
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "<CERT-ARN>" # TLS termination at the LB, re-encrypt between LB and IGW pods
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60" # This is also the default
name: my-ingressgateway
profile: minimal
tag: 1.12.6-distroless
---
apiVersion: networking.istio.io/v1alpha1
kind: Gateway
metadata:
name: my-gateway
namespace: my-ns
spec:
selector:
istio: my-ingressgateway
servers:
- hosts:
- "*"
port:
name: https
number: 443
protocol: HTTPS
tls:
credentialName: the-credential
mode: SIMPLE
---
apiVersion: networking.istio.io/v1alpha1
kind: VirtualService
metadata:
name: my-vs
namespace: app-ns
spec:
gateways:
- my-ns/my-gateawy
hosts:
- my-cool-hostname.example.com
http:
- match:
- uri:
regex: .*/invoke
rewrite:
uri: /invoke
route:
- destination:
host: the-app
port:
number: 9000
Any help is appreciated!