I am trying to deploy a grpc-server on GKE. The flow of request is client => gce-ingress => envoy => grpc-server
. I am attaching my deployment for all the components.
apiVersion: apps/v1
kind: Deployment
metadata:
name: envoy-deployment
labels:
app: envoy
spec:
replicas: 1
selector:
matchLabels:
app: envoy
template:
metadata:
labels:
app: envoy
spec:
containers:
- name: envoy
image: envoyproxy/envoy:v1.22.5
ports:
- containerPort: 9901
readinessProbe:
httpGet:
port: 9901
httpHeaders:
- name: x-envoy-livenessprobe
value: healthz
path: /healthz
scheme: HTTPS
livenessProbe:
httpGet:
port: 9901
httpHeaders:
- name: x-envoy-livenessprobe
value: healthz
path: /healthz
scheme: HTTPS
volumeMounts:
- name: config
mountPath: /etc/envoy
- name: certs
mountPath: /etc/ssl/envoy
volumes:
- name: config
configMap:
name: envoy-conf
- name: certs
secret:
secretName: secret-tls
---
apiVersion: v1
kind: Service
metadata:
name: envoy-deployment-service
annotations:
cloud.google.com/backend-config: '{"ports": {"443":"envoy-app-backend-config"}}'
cloud.google.com/neg: '{"ingress": true}'
spec:
ports:
- protocol: TCP
port: 443
targetPort: 9901
selector:
app: envoy
type: NodePort
externalTrafficPolicy: Local
# loadBalancerIP: 35.225.129.124
---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: envoy-app-backend-config
spec:
timeoutSec: 30
connectionDraining:
drainingTimeoutSec: 30
healthCheck:
checkIntervalSec: 5
timeoutSec: 5
healthyThreshold: 1
unhealthyThreshold: 2
type: HTTP2
requestPath: /healthz
port: 9901
customRequestHeaders:
headers:
- "TE:trailers"
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: envoy-ingress-prod
annotations:
# kubernetes.io/ingress.class: "gce"
kubernetes.io/ingress.global-static-ip-name: static-ip
kubernetes.io/ingress.allow-http: "false"
cert-manager.io/issuer: issuer
cloud.google.com/backend-config: '{"default": "envoy-app-backend-config"}'
# cert-manager.io/cluster-issuer: letsencrypt-staging
# acme.cert-manager.io/http01-edit-in-place: "true"
labels:
name: envoy-ingress-app
spec:
tls:
- hosts:
- domain.com
secretName: secret-tls
rules:
- host: domain.com
http:
paths:
- path: /*
pathType: ImplementationSpecific
backend:
service:
name: envoy-deployment-service
port:
number: 443
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-server
labels:
app: server
spec:
replicas: 1
selector:
matchLabels:
app: server
template:
metadata:
labels:
app: server
spec:
containers:
name: server
image: gcr.io/project/image:latest
command: ["python3" , "/var/app/api/main.py"]
imagePullPolicy: Always
volumeMounts:
- mountPath: /secrets/gcloud-auth
name: gcloud-auth
readOnly: true
ports:
- containerPort: 8000
volumes:
- name: gcloud-auth
secret:
secretName: gcloud
---
apiVersion: v1
kind: Service
metadata:
name: app-server-headless
spec:
type: ClusterIP
clusterIP: None
selector:
app: server
ports:
- name: grpc
port: 8000
targetPort: 8000
protocol: TCP
My envoy config looks like this:- apiVersion: v1 kind: ConfigMap
metadata:
name: envoy-conf
data:
envoy.yaml: |
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address: { address: 127.0.0.1, port_value: 9902 }
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 9901 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: auto
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: "/dev/stdout"
stat_prefix: ingress_https
route_config:
name: local_route
virtual_hosts:
- name: envoy_service
domains: ["*"]
routes:
#- match:
# prefix: "/healthz"
# direct_response: { status: 200, body: { inline_string: "ok it is working now" } }
- match:
prefix: "/heal"
direct_response: { status: 200, body: { inline_string: "ok heal is working now" } }
- match:
prefix: "/envoy/"
route: {
prefix_rewrite: "/",
cluster: envoy_service
}
- match:
prefix: "/"
route: {
prefix_rewrite: "/",
cluster: envoy_service
}
cors:
allow_origin_string_match:
- prefix: "*"
allow_methods: GET, PUT, DELETE, POST, OPTIONS
allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
max_age: "1728000"
expose_headers: custom-header-1,grpc-status,grpc-message
http_filters:
- name: envoy.filters.http.cors
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
- name: envoy.filters.http.grpc_web
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.grpc_web.v3.GrpcWeb
- name: envoy.filters.http.health_check
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
pass_through_mode: false
headers:
- name: ":path"
exact_match: "/healthz"
- name: "x-envoy-livenessprobe"
exact_match: "healthz"
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
require_client_certificate: false
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /etc/ssl/envoy/tls.crt
private_key:
filename: /etc/ssl/envoy/tls.key
alpn_protocols: [ "h2,http/1.1" ]
clusters:
- name: envoy_service
connect_timeout: 0.50s
type: strict_dns
http2_protocol_options: {}
lb_policy: round_robin
load_assignment:
cluster_name: envoy_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: app-server-headless
port_value: 8000
health_checks:
timeout: 1s
interval: 10s
unhealthy_threshold: 2
healthy_threshold: 2
grpc_health_check: {}
From GKE dashboard, all the deployment, service and ingress seems to be running without error as all have green check marks. But when I make request through python client to the grpc server, I receive following error in ingress access log :-
jsonPayload: {
**statusDetails: "backend_connection_closed_before_data_sent_to_client"**
remoteIp: "49.49.199.6"
@type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"}
From gcp documentation, the underlying cause is described as:-
backend_connection_closed_before_data_sent_to_client
The backend unexpectedly closed its connection to the load balancer before the response was proxied to the client. This can happen if the load balancer is sending traffic to another entity. The other entity might be a third-party load balancer that has a TCP timeout that is shorter than the external HTTP(S) load balancer's 10-minute (600-second) timeout. The third-party load balancer might be running on a VM instance. Manually setting the TCP timeout (keepalive) on the target service to greater than 600 seconds might resolve the issue.
Now how where (on envoy or on grpc server
) and how do I enable the TCP timeout (keepalive) on the target service to greater than 600 seconds
?