0

I have a Kubernetes cluster which have a few different services in it. A flask server, an nginx server for static content, traefik and a metabase service.

It all works fine, however, sometimes, when I'm going on the metabase website, the metabase website is really slow and the main website too. I go down for a few minutes.

I don't understand why the metabase service affect the access to the website. The memory/cpu of the cluster doesn't seems to be overloaded. When I fetch the static content from the flask service, I works fine. So it seems link to the network/outside access.

I'm lost. What should I check ?

fast_cen
  • 1,297
  • 3
  • 11
  • 28

1 Answers1

1

One possible issue is that the resource limits in the Kubernetes pods are set too strict. For instance, constraining Traefik to just 100 mCPU can cause requests to slow down and eventually time out for even moderate loads.

Timo Reimann
  • 9,359
  • 2
  • 28
  • 25
  • What I understand from the kubernetes documentation is that `limits` are only used to kill or restart pods if they get out of control, but not to throttle them. Can you confirm or have you observed a different behavior? – Nicolas Gaborel Jan 03 '20 at 09:30
  • 1
    @NicolasGaborel you are correct that limits are used to terminate pods. However, that pertains to memory consumption only. Applications exceeding their CPU limit are indeed throttled via CFS quotas. (In contrast, CPU requests define how CPU cycles should be allocated in overload situations among competing pods.) See also the documentation at https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-limits-are-run; for the gory CFS quota details, check out https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html – Timo Reimann Jan 06 '20 at 02:04