6

I'm trying to figure out how exactly the load balancing of sites like facebook/youtube works, but I have few questions. So after alot reading I figured out that loadbalancing looks like this: When loadbalancer should be a server, who split the traffic between the servers.

So this is how  understand the loadbalancing

My question is: If the load balancer is single server which split the traffic between the other servers, how sites like facebook/youtube can handle 50,000+ requests per second? If the loadbalancer is single server won't it die, how he's able to route 10gbps traffic or more? Also how this load balancers knows on which server video XXXX ( for example ) is located?

voretaq7
  • 79,879
  • 17
  • 130
  • 214

2 Answers2

3

That picture is a good first approximation of loadbalancing and for most sites it'll be more than enough. Sites like google, youtube and facebook can and do use a few more tricks, here's a few I've used so far or am planning to do for another large e-commerce site:

  • Use DNS to spread requests to multiple loadbalancers, even multiple datacenters
  • Use a combination of DNS and anycast IP ranges/CDn's to attract local traffic geographically
  • Have the outermost loadbalancer do only layer 4 balancing to more loadbalancers, and have these do all necesarry layer seven processing

These layer 7 tricks can include:

  • Tying a user to a server via a cookie or url
  • Locating content and redirecting appropriately
  • Analytics for further performance improvement
  • Abuse detection & prevention at layer 7
Dennis Kaarsemaker
  • 19,277
  • 2
  • 44
  • 70
  • 1
    Actually it's not difficult to implement several L3 devices (routers) for a single site using BGP multipath. And these routers may use multipath routing to balance traffic between several L4-7 load balancers. – DukeLion Feb 10 '13 at 10:57
  • I think what you call BGP multipath is the same as what I call anycast: announcing routes from multiple locations. – Dennis Kaarsemaker Feb 10 '13 at 10:59
  • Not really, anycast is used for multiple distributed location. It may or may not assume multiple paths to the network. – DukeLion Feb 10 '13 at 11:00
  • Oh, you mean a more traditional multihoming (and maybe peering) setup. Yeah, completely missed that in my answer, thanks! – Dennis Kaarsemaker Feb 10 '13 at 11:02
0

For the second part of the question - a loadbalancer either has a up-to-date database that contains information about which server can process which request, or it may use some internal redirect messages from backend servers.

DukeLion
  • 3,259
  • 1
  • 18
  • 19