High availability docker swarm

Question

I have a Docker swarm cluster running on several servers (nodes). Docker swarm is using round robin to serve requests to my services and this is working well no matter which server receives a request.

Now I wonder how to route a URL to this setup to have it highly available.

I use Euro DNS (eurodns.com) and set the A record of the domain to multiple IP addresses (the swarm cluster servers).

In general, this seems to be fine as the DNS serves it in round robin and even when it is cached, the node which is hit will serve it round robin due to Docker swarm.

But what if a node/server fails completely? Will I still have high availability?

Even if I were to use another HAProxy load balancer or the like, it seems there is always this problem of DNS being bound to serve to some IP address? So if a HAProxy server fails completely, this would be the same situation?

score 1 · Accepted Answer · answered Aug 07 '19 at 17:23

1

Round robin DNS should work for most scenarios, though there will be a delay when requests to one IP timeout and the client eventually retries the next IP in the list. What it will not help with is a partial failure where the host is responding to network requests but the application (e.g. docker) is not responding or giving bad responses.

A load balancer improves this in a few ways. First, it can poll the application for its health with a configurable probe, and only send requests to healthy instances. This avoids the partial failure scenario. And second, multiple load balancers can be configured with a virtual IP allowing a backup load balancer to take over requests without waiting on DNS timeouts on the client side.

answered Aug 07 '19 at 17:23

BMitch

5,966
1
25
32

Great answer, thank you. I already implemented the healthcheck on Docker Swarm, so in case the application fails, this is handled. The only mishap that could happen is a real server completely crashes as you described this would incur the waiting of clients on timeout, but should then work on retry. Do you have any good source on how to configure load balancers with virtual IPs? Would that be able to solve a total machine crash? – mpaepper Aug 08 '19 at 06:43
I found this which helped me understand setting up the virtual IP load balancer approach and indeed this would help with total node failure: https://www.howtoforge.com/setting-up-a-high-availability-load-balancer-with-haproxy-keepalived-on-debian-lenny However, it requires that you have two nodes sharing a network and being able to configure them on a virtual IP as described in the article. – mpaepper Aug 08 '19 at 07:11
1

@mpaepper the container healthcheck is absolutely needed for HA (don't want docker sending requests to a container while it's still starting). However it only covers container issues, not issues in any of the higher layers (docker engine, ip tables entries, networking, hardware). Looks like you've found the requirements for an HA LB. – BMitch Aug 08 '19 at 10:06

High availability docker swarm

1 Answers1