I was wondering if there are any ways to spin up another job, on the event that a pod controlled by a job fails, which would update the job controlling that pod’s memory request value, maybe double it, before the pod restarts?
I’ve looked up PreStop container lifestyle hooks, operators, etc. Right now the best solution seems to be a custom controller that runs over all jobs and if they are in Reboot state or something like that and have a certain label then it doubles their memory request.