I have a swarm of 3 managers, 3 workers as below:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
ocnuul8dcbrf4gjtdzv06t0yf * manager1 Ready Active Leader 18.06.0-ce
z297dhtfon50pt4hllu4qfz6i manager2 Ready Active Reachable 18.06.0-ce
ondpdzyq06pd3oysn34p4xi9o manager3 Ready Active Reachable 18.06.0-ce
0bls0g65gee1wbv7wr6rwgbjk worker1 Ready Active 18.06.0-ce
mxtg28slr5rvljrayaf4k1wkk worker2 Ready Active 18.06.0-ce
hqu1436bvbar9srbr34er3fl4 worker3 Ready Active 18.06.0-ce
All managers are available.
However, when i deploy a service on the swarm, manager3 is stuck in preparing state
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
lmhpsgeqax13 web-fe.1 nigelpoulton/pluralsight-docker-ci:latest worker1 Running Running 19 minutes ago
nivas3gkh0pa web-fe.2 nigelpoulton/pluralsight-docker-ci:latest worker3 Running Running 19 minutes ago
5plwh46jri3t web-fe.3 nigelpoulton/pluralsight-docker-ci:latest worker2 Running Running 19 minutes ago
l1ykqzgzbgmb web-fe.4 nigelpoulton/pluralsight-docker-ci:latest manager2 Running Running 19 minutes ago
q788hrm6rba9 web-fe.5 nigelpoulton/pluralsight-docker-ci:latest manager3 Running Preparing 21 minutes ago
I could see in the /var/log/docker.log for manager3 that its failing while trying to establish connection with manager2's IP(192.168.99.105:2377)
7T00:10:54.230023789Z" level=warning msg="grpc: addrConn.createTransport failed to connect to {192.168.99.105:2377 0 <nil>}. Err :connection error: desc = \"transport: Err7T00:10:54.230049538Z" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420a86940, TRANSIENT_FAILURE" module=grpc
Since manager1 is the leader , i was expecting it to send the message/signal to manager1 on preparing, but i dont understand why its trying to connect to manager2. Could some one help me understand? Also, how do i recover from this and move manager3 from preparing to running state?
Regards