2

I have a 3 node swarm. Each of which has a static ip address. I have a leader node-0 on ip 192.168.2.100, a backup manager node-1 on 192.1682.101, and a worker node-2 on 192.168.2.102. node-0 is the leader that initialized the swarm, so the --advertise-addr is 192.168.2.100. I can deploy services that can land on any node, and node-0 handles the load balancing. So, if I have a database on node-2 (192.168.2.102:3306), it is still reachable from node-0 192.168.2.100:3306, even though the service is not directly running on node-0.

enter image description here

However, when I reboot node-0 (let's say it loses power), the next manager in line assumes leader role (node-1) - as expected.

But, now if I want to access a service, let's say an API or database from a client (a computer that's not in the swarm), I have to use 192.168.2.101:3306 as my entry point ip, because node-1 is handling load balancing. So, essentially from the outside world (other computers on the network), the ip address of swarm has changed, and this is unacceptable and impractical.

Is there a way to resolve this such that a given manager has priority over another manager? Otherwise, how is this sort of issue resolved such that the entry point ip of the swarm is not dependent on the acting leader?

  • Not sure I'm following the question. The advertise addr is for nodes joining the swarm, not for a LB to your published services. – BMitch Jan 19 '18 at 12:30
  • @BMitch Yeah, I didn't know the right vocabulary. Essentially, the issue is that the nodes are not all properly being added to the ingress network (with mesh routing) to allow requests to be made to any node. I haven't resolved that issue yet, but in docker terms, the default load balancing is not working because they are not seeing the ingress network. –  Jan 19 '18 at 12:45
  • Sounds like an issue with overlay networking. Make sure the network isn't blocking traffic between the nodes, and open up the following in iptables on each of the nodes: 7946/tcp (control), 7946/udp (control), 4789/udp (data), protocol 50 for ipsec. – BMitch Jan 19 '18 at 12:53
  • @BMitch That's what Im working on doing now. I'm using ufw to open those ports because it seems a little simpler than iptables. However, when I make the cluster and deploy a service, I go run `netstat` and see that they are not talking on port 4789. They will talk on 7946. But, yeah, definitely an overlay network. Ill try deploying my own overlay network and deploying the service into that new network, and see if that helps. But having the default ingress network would be great –  Jan 19 '18 at 12:56
  • The default ingress network is an overlay network, so if you have trouble with overlay networking, the ingress network won't work. For debugging, have a look at: https://github.com/nicolaka/netshoot – BMitch Jan 19 '18 at 13:20
  • @BMitch Ill take a look. Also, when you say protocol 50 for ipsec, does that mean I need port # 50 to be open? But yea, for whatever reason, by default only my leader listens on port 4789, not my backup manager, or my working. –  Jan 19 '18 at 13:34
  • Protocol 50, not tcp or udp: `iptables -A INPUT -p 50 -j ACCEPT`. That's only needed if you turn on the secure option of overlay networking, which is disabled by default. Apps that need encryption inside the cluster often use MTLS. – BMitch Jan 19 '18 at 13:37
  • 1
    @BMitch The issue was that im using `ufw`. So port 4789 was closed and i needed to open it in each node so they could participate in mesh routing. Thanks. –  Jan 24 '18 at 07:18

1 Answers1

0

Make all three of your nodes managers and use some sort of load balanced DNS to point to all three of your manager nodes. If one of the managers goes down, your DNS will route to one of the other two managers (seamlessly or slightly less seamlessly depending on how sophisticated your DNS routing/health-check/failover setup is). When you come to scale out with more nodes, nodes 4, 5, 6 etc can all be worker nodes but you will benefit from having three managers rather than one.

AgileZebra
  • 593
  • 6
  • 14
  • What happens if the manager whose ip is the advertise-addr goes down? How do I join a new manager if that is no longer available? – Arun Jose Apr 11 '18 at 10:04
  • Thanks for the answer. I do not have that complex of a setup such that I have an external load balancer. But, that is a good idea to solve this issue. The problem though was that on some of the nodes port `4789 ` was not open. That's the port that connects them all to the ingress mesh routing network (that allows for load balancing). But even if this were to fail again, for whatever reason, your suggestion would resolve that little bug, and give me a more fail safe solution. Thanks. –  Apr 19 '18 at 18:12
  • 1
    @ArunJose You can run `docker swarm join-token manager` or `docker swarm join-token worker` on any manager node. The token will be the same. The only difference is the ip-address:port at the end of the string will be the ip of the node you ran the command on. –  Apr 19 '18 at 19:22
  • 2
    So you mean the original advertise-addr doesn't matter? I can join by using the IP of any manager? – Arun Jose Apr 20 '18 at 03:05
  • @ArunJose I think this is wrong and actually inside the token... It works at first but at the end, the worker's manager's address is just as you used your advertise-address – Ben Aug 22 '22 at 14:02