We have an HTTP(s) Load Balancer created by a kubernetes ingress, which points to a backend formed by set of pods running nginx and Ruby on Rails.
Taking a look to the load balancer logs we have detected an increasing number of requests with a response code of 0
and statusDetails
= client_disconnected_before_any_response
.
We're trying to understand why this his happening, but we haven't found anything relevant. There is nothing in the nginx access or error logs.
This is happening for multiple kind of requests, from GET to POST.
We also suspect that sometimes despite of the request being logged with that error, the requests is actually passed to the backend. For instance we're seeing PG::UniqueViolation errors, due to idential sign up requests being sent twice to the backend in our sign up endpoint.
Any kind of help would be appreciated. Thanks!
UPDATE 1
As requested here is the yaml file for the ingress resource:
UPDATE 2
I've created a log-based Stackdriver metric, to count the number of requests that present this behavior. Here is the chart:
The big peaks approximately match the timestamp for these kubernetes events:
Full error: Readiness probe failed: Get http://10.48.1.28:80/health_check: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
So it seems sometimes the readiness probe for the pods behind the backend fails, but not always.
Here is the definition of the readinessProbe
readinessProbe:
failureThreshold: 3
httpGet:
httpHeaders:
- name: X-Forwarded-Proto
value: https
- name: Host
value: [redacted]
path: /health_check
port: 80
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5