9

I am trying to run a service on a swarm composed of three Raspberry PIs.
I have one manager and two worker nodes.

The problem is that sometimes the status of the worker nodes is "Down" even if the nodes are correctly switched on and connected to the network.

I just started using Docker so I might be doing something wrong, but everything seems to be correctly set.
How would you avoid that "Down" status?

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
Giada Confortola
  • 171
  • 1
  • 2
  • 10

4 Answers4

3

I've had the same issue before. You can fix it by cleaning up /var/lib/docker/swarm/ on the problematic node, then reattach it to the swarm.

1) on problem node 

sudo systemctl stop docker
sudo rm -rf /var/lib/docker/swarm

2) on swarm manager 

docker node rm <problem-node-name>
docker swarm join-token worker
    docker swarm join --token <token> <manager_ip>:2377

3) on problem node 

sudo systemctl start docker
enter code here
docker swarm join --token <token> <manager_ip>:2377

 
Xiddoc
  • 3,369
  • 3
  • 11
  • 37
Ryabchenko Alexander
  • 10,057
  • 7
  • 56
  • 88
2

It can depend on your exact version of docker, but your issue was seen in this thread

A possible workaround was to do a docker ps, which seems to helped nodes to join the swarm.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks for your answer. I realized that I had different versions of docker installed on the nodes, so I first installed the same version on every node. The version I am currently using is 17.03.1-ce. However, the problem is not resolved. I also tried your solution, running a docker ps command when the node is switched on but it does not seem to work. – Giada Confortola Jun 06 '17 at 07:29
  • @GiadaConfortola would another stable release like the 17.06 work better? – VonC Jun 06 '17 at 10:03
  • I followed the installation guide on Docker documentation to install it, so it should be the most recent stable one, shouldn't it? Consider that I need the version for the Raspberry Architecture, so 17.06 might not be available yet. – Giada Confortola Jun 07 '17 at 01:59
  • @GiadaConfortola Do you mean the hypriot one? (https://blog.hypriot.com/downloads/) You might try and build a new one (https://github.com/hypriot/rpi-golang) – VonC Jun 07 '17 at 06:06
  • no, I am using the official repository, downloaded from download.docker.com I used this guide to install it: https://docs.docker.com/engine/installation/linux/ubuntu/#install-using-the-repository – Giada Confortola Jun 07 '17 at 07:27
  • @GiadaConfortola OK. I would suspect a 17.06 should be coming soon to a ppa. – VonC Jun 07 '17 at 07:28
1

In my case, the docker node had invalid default route and DNS did not work. I was anyways able to ssh on the machine by ip address. I tested first:

ping google.com

Which did not work. Then I changed the default route:

route -n
route add default gw 10.1.2.3
route del default gw 10.1.2.1 (offending gateway)    

And finally changed the DNS server from:

/etc/resolv.conf

Then the node came up automatically.

PHZ.fi-Pharazon
  • 1,479
  • 14
  • 15
0

In my case, (virtual) network devices changed. Just adjusted settings, did docker swarm leave and docker swarm join for each of the nodes with the problem and then from the manager I removed (docker node rm ...) them. Worked without issues after that.

One more reason it seems related to ufw in Ubuntu (caused by some system failure). If you are using Ubuntu execute: ufw disable then ufw enable and the nodes will join again automatically. If you are not using Ubuntu, disable your firewall momentary to check if its related to that.

lepe
  • 24,677
  • 9
  • 99
  • 108