1

After almost 120 days that services are working well in the containers which in the pods and managed by the deployment resource, I suddenly got the readinessProbe failed error which caused a block of the traffic from the Ingress to the Pods.

Generally, I don't know why it happened, as I said it used to work for 120 days without an issue.

What I have tried

At first I was thinking that it is a rate limit issue as I saw that in the error log I seen Too many error: 429. So I flushed the rate limit data and it used to work again.

Then I completely excluded the /health endpoint from the rate limiter middleware and one day later it happened even though again. What makes me to think that it's not actually an inside rate limit middleware of Express.js, but something else, maybe something managed by GKE.

There are the probes in the yaml configuration:

              startupProbe:
                httpGet:
                  path: /health/?startup_probe=1
                  port: 3003
                initialDelaySeconds: 5
                periodSeconds: 5
                timeoutSeconds: 5
              readinessProbe:
                httpGet:
                  path: /health
                  port: 3003
                initialDelaySeconds: 3
                periodSeconds: 3
                timeoutSeconds: 3

Background

I just want to acheive a zero downtime deployment in Node.js app with K8S - so K8S will pass traffic to the container in the Pod only by the time that the app in the container has already been connected to the DB and ready to receive traffic. This check should be done only once actually - just after a restart of a new deployment rollout.

So in the /health/startup_probe=1 I do check connections to the DB, and this happens only once as I need. And the readinessProbe just checks that the node app functions and receiving requests to the API endpoint (without DB checks).

However, I found that it useful to also add a readinessProbe since without that I had issues in the past - Pods remain on read 0/1 for too long.

Also, when I place logs on the /health path I see that in stead of 1 log in 3 second as stated in the periodSecond: 3 it actually logs 3 times pre second. Why does it happening?

What is the best configuration to achieve Node.JS zero downtime deployment with K8S? I don't think that I point to some corner case - it's just the basic of the the basic..

Raz Buchnik
  • 7,753
  • 14
  • 53
  • 96

0 Answers0