I'm running a Kubernetes cluster on GKE autopilot
I have pods that do the following - Wait for a job, run the job (This can take minutes or hours), Then go to Pod Succeeded State which will cause Kubernetes to restart the pod.
The number of pods I need is variable depending on how many users are on the platform. Each user can request a job that needs a pod to run.
I don't want users to have to wait for pods to scale up so I want to keep a number of extra pods ready and waiting to execute.
The application my pods are running can be in 3 states - { waiting for job
, running job
, completed job
}
Scaling up is fine as I can just use the scale API and always request to have a certain percentage of pods in waiting for job
state
When scaling down I want to ensure that Kubernetes doesn't kill any pods that are in the running job
state.
Should I implement a Custom Horizontal Pod Autoscaler?
Can I configure custom probes for my pod's application state?
I could use also use pod priority or a preStop hook