I currently have a 3 node swarm mode cluster. 1 manager and 2 workers. I have created a service with replicas of 20. When running docker service ps <service>
, I do see all replicas have been deployed evenly in 3 nodes. I believe the default swarm placement strategy is spread
instead of binpack
. That's all good. The problem is when I restart one of the workers after some OS maintenance. The node will take a while to reboot, but at this time, I do not want the services to reschedule to the other 2 nodes because I know the restarted node will soon come back online. Is there a way to delay swarm to reschedule replicas after a node reboot or failure? I want to give it more time before confirming the node is really failed, like maybe 5 minutes or so.
Docker version 20.10.7