Auto rebooting down servers

Question

Currently have a web server VPS in Germany with Hetzner. They are offering free load balancer service. Thinking about cloning my VPS in their Finland site (meaning adding another VPS with cloned setup/contents) for daily backups that can also increase my site's availability through the load balancer.

Is this even possible with two VPS, or would I need an extra one for storage/database? I'm basically trying to accomplish a RAID-1 structure but with two VPS instead of two hard drives.

What's the best way to set this up on Debian 10 in a way that whenever one of the servers is down, the other one can send the order to automatically reboot?

When you load balance for high availability you need to ensure your database is in sync between the two sites, which can cause issues. You can do this with database replication or master / slave, but it takes some work to design this and test it works as expected. Static sites are easy. — Tim, Aug 05 '20 at 23:46
Yes, there could be issues replicating the database between one server and the other one if there's simultaneous writes and network issues between datacenters. What about with three VPS, two acting as redundant web servers (plus static DB backup) and one master database? What is this structure called, so I can Google it? — cvlo, Aug 05 '20 at 23:49
Best in the case of DB could be run (e.g.) Percona XtraDB Cluster. You'd need a third VPS (since cluster needs an odd number to avoid "split brain") but that doesn't necessarily need to be another DB copy, it could just be an arbitrator. — tater, Aug 05 '20 at 23:58
Thanks! So in this case they aren't using two servers? My understanding from the article is that they are coordinating the DB between two servers, but intuitively it doesn't make sense to me because of the split brain thing you mention. Not sure what I'm missing here: https://www.digitalocean.com/community/questions/multiple-high-availability-between-datacenters — cvlo, Aug 06 '20 at 00:22
If your DB is small/simple enough you could just do 3 DB nodes. If using 2 DB nodes, then when a fault occurs the arbitrator (on a third potentially smaller/simpler VPS) decides which of the two DB is still "up", which essentially means which one it is still communicating with. That DB node which is still up becomes the "primary" and the node which is down will sync to it automatically when it comes back up. — tater, Aug 06 '20 at 00:54
Two web servers using a single database server is a common configuration, but that database server can become a single point of failure. Latency between the web server and the database server can cause performance issues, and you have to secure the connection. Using something like AWS RDS solves a lot of these issues for you, but you can do it yourself, it's just difficult. — Tim, Aug 06 '20 at 01:43

score 0 · Answer 1 · answered Aug 05 '20 at 23:45

There's two parts. First is to detect whether the server is down; second is to send the reboot command.

For detection, I'd suggest using a service that monitors from multiple sites, such as HetrixTools (there's many others too). If you simply monitor from the other VPS, there's a risk of a "false positive" where the other VPS is considered "down" due to some network issue between Hetzner datacenters but accessible from most locations, leading to an unnecessary reboot.

In order to send the reboot command, use Hetzner's API (look under "Soft reboot a server"). Most monitoring services have a "webhook" notification which you can use to call the API. If not (or if you don't want to disclose your API key to the monitoring service) then you can point the webhook at your own web server and call the API from there.

More generally, rebooting the server soon as it is detected down is not necessarily the best idea. You don't know the reason (is Hetzner about to perform scheduled maintenance?) and an uncontrolled reboot can lead to more rather than less downtime if the problem is transient. A down notification also typically happens as soon as the server goes down and never again, which is potentially the worst time to reboot. There probably needs to be a bit more sophisticated algorithm, such as waiting X minutes and retrying after at least Y minutes rather than just blindly rebooting on a down detection. But this is beyond the scope of your original question.

Excellent points, thanks for the suggestion. Will research further and tune back — cvlo, Aug 05 '20 at 23:49

Auto rebooting down servers

1 Answers1