How best, when a pod controlled by a job fails due to out of memory error, to increase the requested memory before rebooting the pod?

Question

I was wondering if there are any ways to spin up another job, on the event that a pod controlled by a job fails, which would update the job controlling that pod’s memory request value, maybe double it, before the pod restarts?

I’ve looked up PreStop container lifestyle hooks, operators, etc. Right now the best solution seems to be a custom controller that runs over all jobs and if they are in Reboot state or something like that and have a certain label then it doubles their memory request.

score 1 · Accepted Answer · answered Sep 14 '21 at 12:15

You could yuse Vertical Pod Autoscaler in Auto or Recreate mode.
It support Jobs, CronJobs, as well as Deployments.

Please be aware of limitations, such as it cannot be used with HPA on CPU or memory, and VPA can't evict pods which are not run under a controller.

How best, when a pod controlled by a job fails due to out of memory error, to increase the requested memory before rebooting the pod?

1 Answers1