backend_connection_closed_before_data_sent_to_client while using grpc server with envoy and gce ingress

Question

I am trying to deploy a grpc-server on GKE. The flow of request is client => gce-ingress => envoy => grpc-server. I am attaching my deployment for all the components.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: envoy-deployment
  
  labels:
    app: envoy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: envoy
  template:
    metadata:
      labels:
        app: envoy
    spec:
      containers:
      - name: envoy
        image: envoyproxy/envoy:v1.22.5
        ports:
        - containerPort: 9901
        readinessProbe:
          httpGet:
            port: 9901
            httpHeaders:
            - name: x-envoy-livenessprobe
              value: healthz
            path: /healthz
            scheme: HTTPS
        livenessProbe:
          httpGet:
            port: 9901
            httpHeaders:
              - name: x-envoy-livenessprobe
                value: healthz
            path: /healthz
            scheme: HTTPS
        volumeMounts:
        - name: config
          mountPath: /etc/envoy
        - name: certs
          mountPath: /etc/ssl/envoy
      volumes:
      - name: config
        configMap:
          name: envoy-conf
      - name: certs
        secret:
          secretName: secret-tls
---
apiVersion: v1
kind: Service
metadata:
  name: envoy-deployment-service
  
  annotations:
    cloud.google.com/backend-config: '{"ports": {"443":"envoy-app-backend-config"}}'
    cloud.google.com/neg: '{"ingress": true}'
spec:
  ports:
  - protocol: TCP
    port: 443
    targetPort: 9901
  selector:
    app: envoy
  type: NodePort
  externalTrafficPolicy: Local
#  loadBalancerIP: 35.225.129.124

---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: envoy-app-backend-config
  
spec:
  timeoutSec: 30
  connectionDraining:
      drainingTimeoutSec: 30
  healthCheck:
    checkIntervalSec: 5
    timeoutSec: 5
    healthyThreshold: 1
    unhealthyThreshold: 2
    type: HTTP2
    requestPath: /healthz
    port: 9901
  customRequestHeaders:
    headers:
    - "TE:trailers"
---

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: envoy-ingress-prod
  
  annotations:
#    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.global-static-ip-name: static-ip
    kubernetes.io/ingress.allow-http: "false"
    cert-manager.io/issuer: issuer
    cloud.google.com/backend-config: '{"default": "envoy-app-backend-config"}'
#    cert-manager.io/cluster-issuer: letsencrypt-staging
#    acme.cert-manager.io/http01-edit-in-place: "true"
  labels:
    name: envoy-ingress-app
spec:
  tls:
  - hosts:
    - domain.com
    secretName: secret-tls
  rules:
  - host: domain.com
    http:
      paths:
      - path: /*
        pathType: ImplementationSpecific
        backend:
          service:
            name: envoy-deployment-service
            port:
              number: 443
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-server
  
  labels:
    app: server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: server
  template:
    metadata:
      labels:
        app: server
    spec:
      containers:
        name: server
        image: gcr.io/project/image:latest
        command: ["python3" , "/var/app/api/main.py"]
        imagePullPolicy: Always
        volumeMounts:
        - mountPath: /secrets/gcloud-auth
          name: gcloud-auth
          readOnly: true
        ports:
        - containerPort: 8000
      volumes:
      - name: gcloud-auth
        secret:
          secretName: gcloud
---
apiVersion: v1
kind: Service
metadata:
  name: app-server-headless
  
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: server
  ports:
    - name: grpc
      port: 8000
      targetPort: 8000
      protocol: TCP

My envoy config looks like this:- apiVersion: v1 kind: ConfigMap

metadata:
  name: envoy-conf
  
data:
  envoy.yaml: |
    admin:
      access_log_path: /tmp/admin_access.log
      address:
        socket_address: { address: 127.0.0.1, port_value: 9902 }

    static_resources:
      listeners:
        - name: listener_0
          address:
            socket_address: { address:  0.0.0.0, port_value: 9901 }
          filter_chains:
            - filters:
              - name: envoy.filters.network.http_connection_manager
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                  codec_type: auto
                  access_log:
                  - name: envoy.access_loggers.file
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
                      path: "/dev/stdout"
                  stat_prefix: ingress_https
                  route_config:
                    name: local_route
                    virtual_hosts:
                      - name: envoy_service
                        domains: ["*"]
                        routes:
                        #- match:
                        #   prefix: "/healthz"
                        #  direct_response: { status: 200, body: { inline_string: "ok it is working now" } }
                        - match:
                           prefix: "/heal"
                          direct_response: { status: 200, body: { inline_string: "ok heal is working now" } }
                        
                        - match:
                            prefix: "/envoy/"
                          route: {
                            prefix_rewrite: "/",
                            cluster: envoy_service
                          }
    
                        - match:
                            prefix: "/"
                          route: {
                            prefix_rewrite: "/",
                            cluster: envoy_service
                          }
                        cors:
                          allow_origin_string_match:
                            - prefix: "*"
                          allow_methods: GET, PUT, DELETE, POST, OPTIONS
                          allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
                          max_age: "1728000"
                          expose_headers: custom-header-1,grpc-status,grpc-message
                  http_filters:
                    - name: envoy.filters.http.cors
                      typed_config:
                        "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
                    - name: envoy.filters.http.grpc_web
                      typed_config:
                        "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
                    - name: envoy.filters.http.health_check
                      typed_config:
                        "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                        pass_through_mode: false
                        headers:
                        - name: ":path"
                          exact_match: "/healthz"
                        - name: "x-envoy-livenessprobe"
                          exact_match: "healthz"
                    - name: envoy.filters.http.router
                      typed_config:
                        "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
                    
              transport_socket:
                  name: envoy.transport_sockets.tls
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
                    require_client_certificate: false
                    common_tls_context:
                      tls_certificates:
                      - certificate_chain:
                          filename: /etc/ssl/envoy/tls.crt
                        private_key:
                          filename: /etc/ssl/envoy/tls.key
                      alpn_protocols: [ "h2,http/1.1" ]
      clusters:
        - name: envoy_service
          connect_timeout: 0.50s
          type: strict_dns
          http2_protocol_options: {}
          lb_policy: round_robin
          load_assignment:
            cluster_name: envoy_service
            endpoints:
              - lb_endpoints:
                - endpoint:
                    address:
                      socket_address:
                        address: app-server-headless
                        port_value: 8000
          health_checks:
            timeout: 1s
            interval: 10s
            unhealthy_threshold: 2
            healthy_threshold: 2
            grpc_health_check: {}

From GKE dashboard, all the deployment, service and ingress seems to be running without error as all have green check marks. But when I make request through python client to the grpc server, I receive following error in ingress access log :-

jsonPayload: {
**statusDetails: "backend_connection_closed_before_data_sent_to_client"**
remoteIp: "49.49.199.6"
@type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"}

From gcp documentation, the underlying cause is described as:-

backend_connection_closed_before_data_sent_to_client    

The backend unexpectedly closed its connection to the load balancer before the response was proxied to the client. This can happen if the load balancer is sending traffic to another entity. The other entity might be a third-party load balancer that has a TCP timeout that is shorter than the external HTTP(S) load balancer's 10-minute (600-second) timeout. The third-party load balancer might be running on a VM instance. Manually setting the TCP timeout (keepalive) on the target service to greater than 600 seconds might resolve the issue.

Now how where (on envoy or on grpc server) and how do I enable the TCP timeout (keepalive) on the target service to greater than 600 seconds ?

backend_connection_closed_before_data_sent_to_client while using grpc server with envoy and gce ingress

0 Answers0