1

Our on-premise Kubernetes/Kubespray cluster has suddenly stopped routing traffic between the nginx-ingress and node port services. All external requests to the ingress endpoint return a "504 - gateway timeout" error.

How do I diagnose what has broken?

I've confirmed that the containers/pods are running, the node application has started and if I exec into the pod then I can run a local curl command and get a response from the app.

I've checked the logs on the ingress pods and traffic is arriving and nginx is trying to forward the traffic on to the service endpoint/node port but it is reporting an error.

I've also tried to curl directly to the node via the node port but I get no response.

I've looked at the ipvs configuration and the settings look valid (e.g. there are rules for the node to forward traffic on the node port the service endpoint address/port)

Lee Sanderson
  • 41
  • 1
  • 9
  • 1
    Here lies the answer: *I've also tried to curl directly to the node via the node port but I get no response* Check your routing tables and pod config – Yasen Sep 19 '19 at 18:48
  • Have you checked if it happened after specific amount of time? For example your function takes more than 60 seconds to be completed. You can check ingress documentation: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-timeouts or https://www.scalescale.com/tips/nginx/504-gateway-time-out-using-nginx/# – aga Sep 20 '19 at 10:02
  • @Yasen - I checked the routing tables via ipvsadm and everything looks fine. – Lee Sanderson Sep 21 '19 at 06:51
  • @abielak - I don't think the problem is with the ingress controller. The logs show traffic being received by the ingress controller - it just can't forward the traffic on to the node. – Lee Sanderson Sep 21 '19 at 06:52
  • Could you provide yaml files (service, ingress, deployment)? – aga Sep 27 '19 at 12:00
  • I had this same issue. My environment set up has a proxy in it. I had to add an environment variable for NO_PROXY with the domain that was being routed as proxy was intercepting the routing – Simon Mbatia Mar 19 '20 at 11:55

3 Answers3

3

We couldn't resolve this issue and, in the end, the only workaround was to uninstall and reinstall the cluster.

Lee Sanderson
  • 41
  • 1
  • 9
3

I was getting this because the nginx ingress controller pod was running out of memory, I just increased the memory for the pod and it worked.

gary69
  • 3,620
  • 6
  • 36
  • 50
  • This is the correct answer. @gary69 Thanks for this. All please use this answer to fix this issue. – rranj Dec 18 '20 at 06:24
1

I was facing a similar issue and the simple fix was to increase the values for the K8S_CPU_LIMIT and K8S_MEMORY_LIMIT for the application pods running on the cluster.

rk17
  • 41
  • 2
  • That could be one of the possible reason, but not an answer to the question. – mtk Oct 25 '21 at 01:28
  • If you have a new question, please ask it by clicking the [Ask Question](https://stackoverflow.com/questions/ask) button. Include a link to this question if it helps provide context. - [From Review](/review/late-answers/30166497) – Beso Oct 25 '21 at 10:44