0

We have an Istio deployment that has been running successfully for some time now, however when the Ingress Gateway pods are restarted, we get intermittent/random 504s along with successful traffic. I have a feeling its due to Keepalive connections between the ELB and the IGW pods, as there are no 504 logs in gateway, and the LB metrics indicate "ELB 5xx" errors not "HTTP 5xxs." We're on Istio 1.12.6

I've tried configuring the terminationDrainDuration to longer than the Idle Timeout of the LB, but no success (though I do see the pods take the appropriate amount of time to shut down). We can replicate this by sending traffic to the app, and restarting the IGW pods via kubectl rollout restart command.

Config:

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: my-iop
  namespace: my-ns
spec:
  meshConfig:
    defaultConfig:
      # For some reason this has no effect on the IGW pods?
      terminationDrainDuration: 70s
      holdApplicationUntilProxyStarts: true
      proxyMetadata:
        ISTIO_META_EXIT_ON_ZERO_ACTIVE_CONNECTIONS: "true"
  components:
    cni:
      enabled: true
    pilot:
      enabled: true
    ingressGateways:
      - enabled: true
        k8s:
          overlays:
            - kind: Deployment
              name: my-ingressgateway
              patches:
                - path: spec.template.spec.terminationGracePeriodSeconds
                  value: 120
          podAnnotations:
            # setting this does actually affect the IGW pods, the logs print the configuration correctly
            proxy.istio.io/config: |
              terminationDrainDuration: 70s
              proxyMetadata:
                ISTIO_META_EXIT_ON_ZERO_ACTIVE_CONNECTIONS: "true"
          hpaSpec:
            minReplicas: 5
            maxReplicas: 10
          serviceAnnotations:
            service.beta.kubernetes.io/aws-load-backend-protocol: https
            service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
            service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: TCP
            service.beta.kubernetes.io/aws-load-balancer-healthcheck-target: TCP
            service.beta.kubernetes.io/aws-load-balancer-internal: "true"
            service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
            service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "<CERT-ARN>" # TLS termination at the LB, re-encrypt between LB and IGW pods
            service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
            service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60" # This is also the default
        name: my-ingressgateway
  profile: minimal
  tag: 1.12.6-distroless
---
apiVersion: networking.istio.io/v1alpha1
kind: Gateway
metadata:
  name: my-gateway
  namespace: my-ns
spec:
  selector:
    istio: my-ingressgateway
  servers:
    - hosts:
        - "*"
      port:
        name: https
        number: 443
        protocol: HTTPS
      tls:
        credentialName: the-credential
        mode: SIMPLE
---
apiVersion: networking.istio.io/v1alpha1
kind: VirtualService
metadata:
  name: my-vs
  namespace: app-ns
spec:
  gateways:
    - my-ns/my-gateawy
  hosts:
    - my-cool-hostname.example.com
  http:
    - match:
        - uri:
          regex: .*/invoke
      rewrite:
        uri: /invoke
      route:
        - destination:
            host: the-app
            port:
              number: 9000

Any help is appreciated!

PSU2017
  • 113
  • 10

0 Answers0