GKE cluster autoscaler does not scale up

Question

I have a cluster which I recently upgraded from 1.22 to 1.23. On that cluster, I have a mongodb deployed via helm chart (v13.1.3) and some other custom pods. About 2 days after upgrading to 1.23, my mongo started crashing, restarted every 5 minutes or so. The cluster has a single node and I noticed that at the time the crashes started, it was around 100% (it gradually got there over those 2 days apparently). However, the autoscaler did not kick in.

My cluster autoscaler is set up as follows: Node count: 1 Autoscale enabled:

min: 0
max: 3

No taints

Why didn't the autoscaler work? I tried to set min=2. Didn't help. Still shows 1 node.

I checked the autoscaler logs but the only errors there are a noScaleDown event which happened for the past 2 months or so, long before the cluster upgrade and long before the crashes started, so I don't really know if it's related. I'll post the logged reason here anyway:

reason: {
   messageId: "no.scale.down.node.pod.kube.system.unmovable"
   parameters: [
      0: "xxxx"
   ]
}

There were no noScaleUp events or any other kind of event

Any ideas what to look for?

What do the pods logs show? Perhaps the crash is due to "Out of Memory"? Also, what metric are you using for your HPAs? — Gari Singh, Jan 18 '23 at 08:15

GKE cluster autoscaler does not scale up

0 Answers0