Does it still make sense to setup your own failover system (HAproxy + 2 or more servers for example) when self healing cloud seems like a practical solution? They seem to do the same job or am I missing something?
1 Answers
You're missing something. Specifically, the difference between High Availability (and the degrees thereof) and High Capacity/Scalability.
A "self-healing cloud server" (also known as "a machine that reboots itself" -- hardly high technology worthy of the grandiose marketing term) provides you some degree of high availability against a small subset of possible problems -- namely, hardware/kernel failure. Now, whilst hardware does fail, it is the second-least common cause of downtime (if you're wondering, core network outages are the least-common). All the common causes of outages -- system maintenance, human error -- are still lurking there, ready to take you down.
Also, a machine that reboots when it crashes still has some downtime, it's just not as much as it would be if you had to manually login and start the machine again yourself. This may be an acceptable level of downtime, or it may not. I don't know what your tolerances are.
Finally, this doesn't provide you with any capacity for seamless scalability. Sure, you can throw more hardware at the problem (up to a point), but that requires a reboot, too, and it only gets you as far as you can go with a single machine -- if you need to service more traffic, you're up for a re-engineering.
Now, it's entirely possible that you don't need particularly high uptime, or scalability, and so a single machine that reboots itself might be fine. If that's the case, more power to you. But don't think that it's a replacement for a cluster of machines behind a redundant load balancer, properly engineered for availability and capacity, because it's a whole different ball game.

- 96,255
- 29
- 175
- 230
-
From what I understand in what your saying, in terms of high availability, traditional HAproxy failover is faster than cloud rebooting to another machine? And in terms of scaling new hardware, cloud still requires some downtime whereas in traditional failover you can setup additional load-balanced boxes without any downtime? – IMB Apr 01 '12 at 10:45
-
Yeah, more or less. – womble Apr 02 '12 at 00:13
-
You are right. I confirmed with a cloud provider that there is around 5 mins downtime to boot to another available machine so it isn't instantaneous as it was perceived. I guess the only credit it deserves is at least it is done by default while true instantaneous failover requires custom setups. – IMB Apr 02 '12 at 06:27
-
What constitutes "by default"? I've worked in plenty of environments where load-balancing and instantaneous failover were standard. – womble Apr 02 '12 at 09:35
-
It should be standard practice I agree. But by default meaning when I purchase cloud hosting, it is by design that it will reboot to the next machine if the current has become problematic. Now if I purchase a VPS box, it's just that, one box, it doesn't come with load-balancing or failover, since those are optional setup. So in other words VPS w/ load balancing and failover isn't sold as standard in the market. – IMB Apr 02 '12 at 10:45
-
Depends on which corner of the market you're referring to. It certainly has been standard in some of the markets I've worked in. – womble Apr 02 '12 at 11:15