How do sites such as Google achieve high availability?

Question

As I understand it, when I open a website such as Google, the hostname is looked up and my browser uses the resulting IP address to connect to the server and retrieve the page.

However, how do high availability websites make sure that this single IP address can always be reached? isn't that a single point of failure?

Your question is ambiguous and difficult to reasonably answer in its current form. High availability is an expansive topic with numerous approaches. — Warner, Sep 13 '10 at 21:14
Warner is exactly right. As it stands now your question is far too vague for any sort of meaningful answer. Any answer is likely to spawn multiple additional questions. I'd suggest you google around, check out some wikipedia pages, even search existing serverfault questions and then come back with more specific questions covering topics you don't understand. — ThatGraemeGuy, Sep 13 '10 at 21:20
*[sigh]*. A perfectly reasonable and specific question (how do you create a website without a SPOF?) sneered at and closed without a single clear answer being provided. I leave no wiser than I arrived. — Mark Amery, Apr 05 '18 at 15:51

score 7 · Accepted Answer · answered Sep 13 '10 at 20:55

7

There are two common solutions to high availability for web sites: DNS round robin and IP load balancing.

DNS round robin means you get different IP addresses each time you query a DNS server for the site's name; this helps distributing requests across multiple servers, and it also avoids the single point of failure you pointed out. This is the DNS answer for www.google.com (when asked to one of the authoritative name servers for the "google.com" domain):

> www.google.com
Server:  ns1.google.com
Address:  216.239.32.10

www.google.com  canonical name = www.l.google.com
www.l.google.com        internet address = 74.125.77.99
www.l.google.com        internet address = 74.125.77.104
www.l.google.com        internet address = 74.125.77.147

Another common solution, which could also be used at the same time (and very likely is in this case), is IP load balancing; i.e. those IP addresses aren't actually assigned to servers, but instead to load balancing devices (or reverse proxies, or any other similar solution), which then forward the requests to one of several back-end servers; should one of those servers fail, another one would be used.

More info here:

http://en.wikipedia.org/wiki/Round_robin_DNS
http://en.wikipedia.org/wiki/Load_balancing_(computing)

answered Sep 13 '10 at 20:55

Massimo

70,200
57
200
323

2

Actually given the size of google it is more than that. – TomTom Sep 13 '10 at 22:25
3

I'm sure you can add in DNS Anycasting as well – Mark Henderson Sep 14 '10 at 01:05
6

-1 for one reason - DNS round robin is NOT A high availability solution. It distributes load to all IP addresses, whether they are available or not. – TomTom Sep 14 '10 at 07:10
2

BGP is key as well. Really, too general a question. We could nitpick details all day long. – Warner Sep 14 '10 at 14:11
This is the answer I was looking for, I did not know that there is such a thing as DNS round robin! Thanks! – Chris Sep 15 '10 at 12:46
@TomTom: I can see your point, still a certain percentage of users will not notice a problem since they are looking up a different IP, when using DNS round robin. Ofc its not really high-availability then. – Chris Sep 16 '10 at 16:30
Of course this was only meant as an introductory answer; there's much more than this to large-scale highly available systems. I simply described the two most common solutions for front-end web servers... and didn't even touch the back-end HA, which is a whole other issue altogether. – Massimo Jan 31 '14 at 18:05
-1; many sources (and TomTom's comment) claim that DNS round robin does not achieve HA, and IP load balancing seems on its face not to do so either (surely you've just moved the single point of failure forward one layer to the load balancer, not truly eliminated it?). I don't see any strategy here that actually achieves solves the IP-points-to-a-SPOF problem asked about in the OP. – Mark Amery Apr 05 '18 at 15:47

score 1 · Answer 2 · answered Sep 13 '10 at 21:02

1

An IP address isn't necessarily a SPOF as it certainly can be re-affected dynamically (a.k.a. fail-over) to a healthy server should the previous one holding it goes wrong.

answered Sep 13 '10 at 21:02

jlliagre

8,861
18
36

score 1 · Answer 3 · answered Sep 13 '10 at 22:29

1

Google most likely uses THREE Approaches at the same time:

At the backend you have a number of servers to serve requests. They haveall their own IP addresses.
In front of them are Hardware Load balancers that distribute reuqests to servers behind them. They have one public IP each, but may cover 30, 60 or even more physical servers. They are themselves likely redundant from a large manufacturer.
In front DNS Round Robin is LIKELY used. Allows load sitribution to even more load balanders.

Actually all that is nicely described.

http://en.wikipedia.org/wiki/Google_platform

Note that we talk of HUNDREDS OF THOUSANDS OF SERVERS. MANY data centers full of stuff.

Google is very special in that the servers pretty much are read only. They get a copy of the index, and serve that until they are reimaged with a new updated copy. No updates are ever done to an answering cluster. This is unusual for an applicaiton - but not because google is smart or so, just because their requirements are unusual.

answered Sep 13 '10 at 22:29

TomTom

51,649
7
54
136

Well, it's probably *also* because Google is smart. – mfinni Sep 14 '10 at 03:17
In this case not only - their simple scale pretty much makes ths the only viable approach. The number of servers is ridiculously high, but then... they are all similar and they simply need them ;) – TomTom Sep 14 '10 at 07:09
This doesn't make sense to me. Surely making an IP point at a load balancer instead of a server just turns the load balancer into the Single Point of Failure instead of the server, rather than *eliminating* the single point of failure? – Mark Amery Apr 05 '18 at 15:55
Well, LEARN. No, not necessarily. WIndows for example has an IP Load balancer. Put 4 machines in. Then the HA ressource is on a 5th IP (!) and gets dynamically handled. HA on that level is a solved problem - finding the documentation for the products in google is a problem you obviously have not solved yet. – TomTom Apr 05 '18 at 16:04

score 0 · Answer 4 · answered Sep 14 '10 at 00:51

High availability sites uses many technologies as like as DNS roots servers in the way to be reached at any time.

In the fact, to be safe of attacks and failure, we can deploy many solutions as :

Anycast solutions
DNS load balacing
Load balancing and reverse proxy.
Fail-over solutions

How do sites such as Google achieve high availability?

4 Answers4