We are using docker swarm in our production environment. Here is the output of docker node ls
command.
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
5qpi2zmdonheusou7fgkh9m1g ip-10-x-241-y.ec2.internal Ready Active Leader 20.10.2
h5nway19ms4po91f0pjzar22b ip-10-x-241-y.ec2.internal Ready Active 20.10.2
79sikbrre17pf495vijjpydy0 * ip-10-x-241-y.ec2.internal Ready Active Reachable 20.10.2
u83yq5n5gi7rdkit5i3i6gj6i ip-10-x-243-y.ec2.internal Ready Active 20.10.2
o87buageysj1vbcefc9xz4wbe ip-10-x-243-y.ec2.internal Ready Active Reachable 20.10.2
And here is the docker service ls
command output:
ID NAME MODE REPLICAS IMAGE PORTS
m21u7z06tzqw portainer-app replicated 1/1 portainer/portainer:latest *:9002->9000/tcp
jrk2trgqc2r1 aaaaaaaaaaaaaaaaaaaaa global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx *:9200->9200/tcp, *:9300->9300/tcp
3sevi4nv5lnj bbbbbbbbbbbbbb global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx *:5601->5601/tcp
vpij8elkdcqr cccccccccccccccc global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx *:5000->5000/tcp
etyu98fr7fc4 ddddddddddddddddddddddddddddddddddd global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
6spidjk8e4dr eeeeeeeeeeeeeeeeeeeeee replicated 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxx
v5h58ms3as3a fffffffffffffffffffffffffffff global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxx
qb56lj6bb8k6 gggggggggggggggggggggggggggggggg global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxx
3wa4fmhtwxsr hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxx
2kenua5sdrfa iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxx
amq6qls538qy jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj global 1/1 xxxxxxxxxxxxxxxxxxxxxxxxxxxx
qude01eq2c5j kkkkkkkkkkkkkkkkkkkkkkkkk global 2/2 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx *:443->9000/tcp, *:9000->9000/tcp
uirjzopva1rq llllllllllllllllllll global 2/2 xxxxxxxxxxxx
This configurations are working properly more than a year. But last weekend, ops team applied security patches and rebooted the worker node machines. After that one of the worker nodes "u83yq5n5gi7rdkit5i3i6gj6i" doesn't run any container. I remove the node from swarm and added it as worker again but nothing changed. Also I did service update but it only restarts the container in one worker node. Because they are running in global mode, I couldn't scale the services to run 2 containers(it gives error that scaling works only in replica mode). The expected behavior is, after adding a worker node, swarm will auto deploy new containers to new worker node but it didn't.
I believe docker swarm is logging the issue while it couldn't deploy containers on the new worker node but I couldn't find the correct location of the log.
Since it is a production environment, I couldn't recreate docker swarm from scratch. I need to find a way for docker swarm to deploy services in the other worker node.
Any idea?