0

I have a cluster which I recently upgraded from 1.22 to 1.23. On that cluster, I have a mongodb deployed via helm chart (v13.1.3) and some other custom pods. About 2 days after upgrading to 1.23, my mongo started crashing, restarted every 5 minutes or so. The cluster has a single node and I noticed that at the time the crashes started, it was around 100% (it gradually got there over those 2 days apparently). However, the autoscaler did not kick in.

My cluster autoscaler is set up as follows: Node count: 1 Autoscale enabled:

  • min: 0
  • max: 3

No taints

Why didn't the autoscaler work? I tried to set min=2. Didn't help. Still shows 1 node.

I checked the autoscaler logs but the only errors there are a noScaleDown event which happened for the past 2 months or so, long before the cluster upgrade and long before the crashes started, so I don't really know if it's related. I'll post the logged reason here anyway:

reason: {
   messageId: "no.scale.down.node.pod.kube.system.unmovable"
   parameters: [
      0: "xxxx"
   ]
}

There were no noScaleUp events or any other kind of event

Any ideas what to look for?

Sagi Mann
  • 111
  • 3
  • What do the pods logs show? Perhaps the crash is due to "Out of Memory"? Also, what metric are you using for your HPAs? – Gari Singh Jan 18 '23 at 08:15

0 Answers0