How does High Availability work?

Question

I don't understand how to setup a failover for my quite simple scenario. I am building a service gateway for API. What I want to have is two servers ~~hosted in different datacenters~~. And I simply want the user to be able to access the service even if one of the servers is down. There is no issue with DB sync, I only care about availability of the service.

How do I do that while preventing the user to implement any kind of failover logic on their end? I want the user to be given a single domain or a single IP address and be able to access the service all the time using this single end point.

What I do not understand is how this can be achieved. I know I can setup a network node that will forward the requests to the first or second server, depending on which of those two is currently online. However, I fail to see how this setup solved HA problem as we just introduced a single point of failure to the system - the forwarding node. So, if this node goes down, the service is unavailable.

Could you please explain how to implement this in the real world? Is it possible to achieve this with reasonable cost ~~(i.e. not more than a cost of hosting of the servers themselves)~~.

Edit: It has been suggested that different datacenters requirement is costly. So, feel free to provide suggestions for 2 servers within 1 datacenter.

Edit 2: Feel free to mention what is a reasonable cost for the that setup.

I think so, because if they are hosted in 1 datacenter, it is easy to imagine that both servers will go down at the same time. I don't say two different companies, but yes, two different datacenters sound reasonable to me - or is it an abnormal request? — Wapac, Feb 23 '16 at 10:47
It is not, but this overcomplicates the solution, because it requires sophisticated and expensive measures. In the same time you don't have at the moment any kind of high availability. Start with a simple one, to backup your own service failures. You will move to backing up datacenter failures on the next step, in case you will still need it. — drookie, Feb 23 '16 at 10:52
OK, then I just modify the question to allow same datacenter. — Wapac, Feb 23 '16 at 10:54
"*Is it possible to achieve this with reasonable cost?*" As I constantly tell clients, you get 99% uptime for free (ie, for the cost of decent hardware and hosting). Every extra 9 increases the cost by anything up to a factor of ten. **Do not try to do HA on a shoestring budget. It doesn't work.** — MadHatter, Feb 23 '16 at 10:54
I have no idea what is the cost for that, so feel free to suggest what is a reasonable cost. — Wapac, Feb 23 '16 at 10:56
"*Every extra 9 increases the cost by anything up to a factor of ten.*" — MadHatter, Feb 23 '16 at 11:06
@MadHatter Not exactly. IIRC you can get 99.9% QUITE cheap - that is the cost of replacement hardware. We still talk about 8 hours downtime a year. Every hoster provides that - and if you keep backups and jave not terabytes of data there is plenty of time for that. AFTER THAT - I fully gree. — TomTom, Feb 23 '16 at 11:15
@yagmoth555 DNS round robin doesn't work, because of TTL-ignoring cacheing. But even if it did work, RR to what? Still need more hardware, different DC, the services must remain in-sync; **it all costs**. — MadHatter, Feb 23 '16 at 11:37
@MadHatter I agree, I hope the op know it imply more server, but I think he want a answer with a feasable setup without the failover clustering feature, if his service can run that way in RR. (as he told no problem with db sync) — yagmoth555, Feb 23 '16 at 11:52
I would like to receive any reasonable suggestions. I am not familiar with any working solutions. So let's say those 2 servers themselves cost me 500 USD per month. I would not find it reasonable to pay 5000 USD to achieve my goal. I would find it reasonable to pay e.g. extra 500 USD for that. — Wapac, Feb 23 '16 at 11:56
And I'm warning you that, with intangible costs, it will cost more than that. You can't know how much more until you try it, but in my experience you do need to be prepared for up to an order of magnitude. In any case, I've VTCed, because you could write a book on this; therefore, it's too broad a question. — MadHatter, Feb 23 '16 at 11:59
@Wapac The idea I gave, rr, is not a true failover setup. Its only with DNS, It will forward your customer to the entry in your DNS randomly. If a server is fail, the customer got the retry, and when is local cache will expire he will be able to reconnect. The idea imply all the db and data behind is synced. like other told, true ha got is cost. — yagmoth555, Feb 23 '16 at 12:03
@yagmoth555: I am aware of that particular solution, but I think that solution require support on the client side, or that there would be quite a gap because of the caching. I am not sure, if in my case, I can force client to implement "retry with the second IP if the first fails" - i.e. to give them 2 end points instead of just one. But thanks for that suggestion anyway. — Wapac, Feb 23 '16 at 12:08
@MadHatter: It is just that I am not familiar with what is being offered here that I can't tell what is reasonable or not. I thought my scenario is quite common and really simple, but it seems that it is not. Having 500 USD or 1000 USD budget on what I asked for seemed to me like enough money. But then - I really have no idea. — Wapac, Feb 23 '16 at 12:10
@MadHatter: I do not even say that I have that server already - I guess that each hosting has different features that they can offer. So I am happy with answer "if you go with Amazon ABC service, they offer this for $ XXX". — Wapac, Feb 23 '16 at 12:12
@Wapac Your sentence make no sense. DNS rr. you only have one dns to give... and we cant give cost, stop asking for cost, hire a consultant if you want an estimate — yagmoth555, Feb 23 '16 at 12:17
@MadHatter Your numbers don't work out. You say two 9s for free. And each 9 after that cost up to a factor of 10 more. If I do the math on those numbers, I get as many 9s as I want for free. I'm sure that is not what you meant. — kasperd, Feb 24 '16 at 09:54
@kasperd good point, except that you should probably read what I originally wrote, above: "*you get 99% uptime for free (**ie, for the cost of decent hardware and hosting**)*". Ten times that is most certainly *not* zero. I admit that using *free* to mean *no additional cost over the minimum* is reprehensible, but I'm careful to spell the reality out in the same sentence. — MadHatter, Feb 24 '16 at 09:56

score 7 · Accepted Answer · answered Feb 23 '16 at 12:23

it works quite simple. First rule is you have to have anything more than once. For simplicity I will setup it in one datacenter and with IP addresses owned by this DC (you can do it with your own IP addresses and multiple datacenters, but we're tolking about some multihoming AS stuff, BGP and some other things which are not so cheap and easy to implement).

You will need to have at least 4 servers (you can do it with only two, but it is not good way). 2 for app and 2 for loadbalancing, each servers with multiple network cards.

Basic setup is like this:

       /---\     /------\     /----------\
       | S |-----| LB 1 |-----| SERVER 1 |
--NET--| W |     \------/\   /\----------/
       | I |              \_/
       | T |              / \
--NET--| C |     /------\/   \/----------\
       | H |-----| LB 2 |-----| SERVER 2 |
       \---/     \------/     \----------/

You have two separated lines to net provided by your DC. Both of those lines are in same VLAN and both are connected to switch (best way is 2 switches). 2 loadbalancers are connected to those switches and shares one virtual IP. It is IP which can flow between those two machines. You can use VRRP and keepalived to achieve this pretty well.

Behind those two loadbalancers, two mirrored servers are placed. And here comes magic:

You will point your DNS record to that virtual ip
When someone will come to your app, it will go thru one LB and ends at one server
When one server dies, loadbalancer will notice it with something like healtcheck and disable that server. Every new request will be send to health server.
When one loadbalancer dies, keepalived will notice it (again via some healthcheck) and move that floating IP to health loadbalancer and nobody will notice it.

You should know that HA is expensive way and you cannot do it witl low budget. You need to calculate if outage of your service isn't cheaper that cost of HA, sometimes it is.

You should look on keywords vrrp, keepalived and haproxy for some ideas and ways how to think about it.

Thank you, this is informative - I was not aware of the "floating virtual IP". That is what I will have to find more about. — Wapac, Feb 23 '16 at 12:27

score 1 · Answer 2 · answered Feb 23 '16 at 11:00

The usual approach is, of course, using two forwarding (balancing) nodes in some form of HA cluster. The consistency from the point of the outer world is achieved by various forms of shared IP address - VRRP, CARP (same as VRRP, but open implementation), etc. Thus you will have the redundancy on both layers - on the balancing layer and on data/service layer.

The consistency of data/service layer is beyound the scope of this answer, however, usually it's rather simple. You use centralized session store (probably replicated too, like redis or memcached) and replicated set of DBs.

In general this is achievable on only two physical servers, each of them playing dirrerent roles at once: a balancer, a DB server, and so on.

How does High Availability work?

2 Answers2