Failover cluster of 2+ servers for disasters with only 2+ physical machines

Question

TL;DR: Question in bold below

Here http://www.howto-expert.com/how-to-create-a-server-failover-solution/ is an (I assume) old blog post, explaining how to setup 2 server machines (one master and one slave), each located at different geographical locations.
The context is about self-hosting a website (+database) service, and make it such that, when master has failed the sanity check (e.g because machine is turned off, or internet connexion is out, or admin is doing updates, etc.), then slave takes over to serve the website(s) to visitors.
The software solution used in here is "DNS Made Easy" which seems to

spin on a 3rd machine,
associate the website IP to this 3rd machine (when one has to choose a domain name through a registrar),
and seems to re-route the visitor to one of the 2 website hosts.

(I would eventually prefer as solution something like HAproxy+KeepAliveD, just because it's free.)

The important illustration from the URL above is this one :

But now, assuming I afforded hardware for a second slave machine precisely in case the master fails, then the investment would be useless if the 3rd machine (the failover monitor on the pic) crashes.

MAIN QUESTION : How to embed the failover monitoring into the 2 machines?
OR alternatively, how is it possible to do failover with only 2 physical machines?
(in order to get 2 points of failure instead of 1)

Questions it implies:
Why people always end up with a single point of failure (the isolated failover monitor) ?
Shall I use KVM to have "2 servers" within each machine (monitor1 + master in master, and monitor2 + slave in slave) or I can install all different services in a machine?
Is it possible to have 2 machines located away from each other that still share the same IP address ?

Not a solution, but google split brain and STONITH to grasp the problem. You should also clarify how the websites behave in terms of updates. 2 remotely located machines technically can share the same address, but this is hard, very expensive (probably require 256 addresses, dynamic routers to do properly) and won't solve your problems anyway. If your data is reasonably static/split brain is not an issue, there are easier ways of solving this problem. — davidgo, May 05 '20 at 23:10
Look into load balancing, primarily its for keeping load levels down on servers, but you will often find that you can configure to check if a server/service is up, a two for one deal. — CodingInTheUK, May 06 '20 at 01:08

score 1 · Answer 1 · answered May 06 '20 at 16:48

That DNS Made Easy tutorial describes a monitoring node capable of doing DNS changes; a DNS based load balancer. Yes, one monitoring node is a single point of failure. However, being out of the data path, as long as the active node is up, connections will continue to work. Even if the monitoring node is down. Further, it can be located in a 3rd hosting site, to better detect problems reaching it from the outside. One disadvantage is that DNS updates may take a while, they are cached for a long time.

haproxy is a different beast. It proxies connections through to backends, and as such is in the data path. Usually, these are in the same data center location, to reduce latency. Proxy means it can do clever things to requests, rapidly reroute to another backend, terminate TLS, fiddle with HTTP headers, and a lot more. Unplanned downtime would take the service down, so consider high availability of them.

Clusters are yet another thing. These are where application resources like IP addresses or shared storage are moved between hosts. Shared storage makes it possible to fail over databases with the exact same data. However, these are difficult to implement safely, cluster partition - split brain - can be dangerous to system integrity. Further, their storage and network tricks are for traditional data centers, and probably won't work on your typical VM hosting provider.

Is it possible to have 2 machines located away from each other that still share the same IP address ?

Not easily, that's another level to this project. Over the Internet, you would need to do some kind of BGP trick like anycast. You would need your own IP space, an ASN, and some BGP routers. Or outsource it to some provider's load balancer as a service.

Ultimately, decide on your requirements for how fast to recover, and what failure modes you must avoid. Then implement something to meet those.

Should you have servers with different IPs, and can deal with a couple hours of downtime, manually changing the IP address could work. Nice and simple.

If you need immediate automatic failover, no single point of failure, and global CDN style routing, that's a considerably more complex design.

yes I need automatic failover and 2 points of failure. Actually I begin to understand this tutorial : https://docs.iredmail.org/haproxy.keepalived.glusterfs.html For which I have 3 related questions : 1) Should I provide the virtual IP (192.168.1.10 in the tutorial) when I register my domain name? 2) I personnally set this VIP, so how is it ensured that it will be unique across the internet? 3) Can I merge the master front-end with the master back-end ? Will they be able to exchange heartbits internally through some ports instead of IP addresses? — dockerwonderer, May 06 '20 at 23:37
That is an different HA design that needs to have its own question. Suffice it to say you can't keepalived IP addresses across the Internet, it uses neighbor discovery tricks that only work on a layer 2 LAN. — John Mahowald, May 07 '20 at 18:45
thank you so much! so this means that : I need in addition of my ISP router, a switch connected to : the router + a frontend raspberrypi + the backend machine, right? And this, in both of my storage rooms, located away from each other, right? But will the 2 frontends be able to exchange heartbits if they are not within the same sub-network??? — dockerwonderer, May 08 '20 at 15:32

Failover cluster of 2+ servers for disasters with only 2+ physical machines

1 Answers1