I am using Jenkins for Kubernetes, and sometimes the connection to the agent Pod is lost because of ClosedChannelException. From my experience, we observe that this is because of either:
- Random lost connection to the pod, especially when the pod is running large processes that take a lot of resources. Thus the pod still exists, but the connection is now lost
- Pod eviction due to
kubectl drain
by the cluster owner to do maintenance on the node
I'd like to ask for some ideas on how to prevent this. I am not a Jenkins expert and need some help on what is possible.
- For the first point, can I configure the Jenkins master somehow to attempt to "re-search" the agent that is lost?
- For the second point, I know that Kubernetes has a 'preStop' hook to run a pre-stop command before the pod is evicted. Is it somehow possible to tell the control plane (maybe through this hook, or some other way) to not evict this pod until it completes its operation?
For context, I am also deploying Jenkins using Jenkins Configuration as Code.