2
  1. first of all, I know about k8s and how it works with all their stuff;
  2. second, I need to tell about my app and how it works on k8s
  3. 3rd, I gonna ask my question:

this app consist of three pods which each one has their own services:

1. nginx with php-fpm (called sample-app)

2. redis

3. queue(based on node js)

this app work in this way:

when a request comes into sample-app, this app stores request as a job in Redis, this while queue watch Redis for unprocessed jobs and process them(for example send an email).

now my issue:

the queue pod with no reason and no time order and with no logs stop working, there is a need to tell again in the k8s logs you can't see any error or warning, it's mostly like stuck and do nothing!!. this app works fine on minikube and before k8s, on VM. so every time I delete queue pods, the new created ones start to work normally and process jobs from Redis.

I test this app on minikube with benchmark tools (send request from outside of cluster with ab command) with no problems and no errors, so whenever I move to k8s and test this app with hight request density the queue pod stop working too again as my second issue whit this app.

*extra information about my cluster: k8s ver is 1.15, docker ver 19.3.1, CNI plugin is calico, and I use helm.

now is there any way to troubleshoot what is the problem and recognize where is the problem comes from?

alireza71
  • 339
  • 1
  • 3
  • 14
  • just interesting - have you tried other CNI, e.g Flannel? Do you experience the same? – Vit Sep 12 '19 at 14:21
  • @vkr no, is it help if I gonna change that? – alireza71 Sep 12 '19 at 14:24
  • this is only my experience. Comparing with Calico - I had no problems with Flannel at all. Im not agitating, just give a chance to another CNI – Vit Sep 12 '19 at 14:26
  • Do you have health check for the stuck pod? Did you try cloud environment e.g., GKE? – Dagang Sep 12 '19 at 16:05
  • can you perform a netcat against Redis to see if the queue Pod can connect to it? – prometherion Sep 12 '19 at 16:56
  • @prometherion yes, when the queue pod stop working, I used that and find totally connected. – alireza71 Sep 12 '19 at 17:13
  • @dagang the queue pod run pm2, and pm2 restart if faced with any exception and so my pod will restart if pm2 going to restart but ther restart count on kubectl get command is zero! – alireza71 Sep 12 '19 at 17:15
  • What kind of kubernetes object are you using for rolling out you queue application? – Bimal Sep 13 '19 at 02:01
  • @bimalvasan I'm using deployment.have you any suggest? – alireza71 Sep 13 '19 at 03:45
  • 1
    this post is completely useless without any snippets of logs, manifests, calico versions, calico node logs, calico etcd logs, kubectl outputs and so on. Nobody doesn't know even is there a calico CNI binary in your cni dir – Konstantin Vustin Sep 13 '19 at 14:04
  • 1
    If it works fine in Minikube, but does not work on GKE, then it's related to the environment. Could you compare your local k8 environment and the one in GKE (number of CPUs, amount of memory, etc.)? We had a similar issue in Hazelcast, everything worked well in Minikube, but it failed on GKE. And the bug happened to be related to the fact that GKE by default creates each node with 1 single CPU (and we created the number of threads == number of CPUs and therefore the application got stuck at some point, because it had only one thread in the pool). – Rafał Leszko Sep 16 '19 at 07:43

0 Answers0