DNS and failure tolerance strategies for load balancers

Question

I'm trying to educate myself about how to build a highly available load balancing service for application servers for e.g. HTTP traffic and how they work with DNS servers.

Consider the following diagram. My understanding is that load balancers (e.g. HAProxy) can be configured to designate a primary server and a fall-back strategy to a secondary (that becomes the new primary) if the primary fails.

Wouldn't that require the DNS server to know of or elect a primary LB server?
Aren't DNS servers sometimes outside of the data center (or outside of direct control) of the company or organization that manages the load balancers themselves? If so, how do they specify in the DNS servers what LB server to hit?

Josh load balancing comes from your choice of server application. For example: NGINX does everything you need, Proxy, Load balancing, Web Server, AMAZING SSL. Take a look http://nginx.org/en/docs/http/load_balancing.html — suchislife, Jun 26 '20 at 21:30
@suchislife the specific brand of reverse proxy/load balancer does not answer how to achieve redundancy within the reverse proxy layer itself. Nginx faces the same problem as haproxy in this regard. — ErikE, Jun 27 '20 at 11:59

score 1 · Accepted Answer · edited Oct 07 '21 at 07:59

You have identified the problem of creating redundant backend servers to a load balancer (or reverse proxy), only to find that the load balancer itself becomes a single point of failure.

This is usually solved by having two or more load balancer units share a common ip address, aka floating ip address, thereby creating a load balancing cluster.

The DNS entry will specify this floating ip address only, and let the load balancing cluster figure out which load balancing unit recieves which incoming request. Therefore DNS servers commonly do not require knowledge of primary and fallback members within the load balancer group.

Different implementations exist making possible both active/passive designs where only one cluster node is reachable through the floating ip address at any one time, and active/active designs where all cluster units are reachable through the floating ip address at the same time.

A multitude of cluster protocols and applications exist, see for example vrrp, hsrp, glbp. Knowing the terminology, finding more alternatives is a trivial task.

haproxy can be deployed in a number of ways to achieve a clustered functionality and solutions are easily searched for, see for instance here, and here.

There are other approaches. See for instance DNS Load Balancing with Round Robin and DNS Geolocation routing.

Yes, it is true that the DNS service may be hosted externally to the organisation hosting the load balancer. This usually only affects lead times for changes, but not the load balancer cluster functionality itself (caveat: specialized solutions).

Thanks! So what you are saying is that these cluster protocols (vrrp, hsrp, etc) are fully responsible for making sure the corresponding floating IP address is actually assigned (i.e. traffic routed) to the right server/s (including multiple machines) at a given point in time? and I assume that requires clear coordination between the load balancers themselves as well as network routers of course to make sure the IP packets go to those designated servers. — Josh, Jun 27 '20 at 16:22
Yes. The LB-units usually have their individual ip addresses (which are tied to their underlying mac-addresses). The floating ip is an additional ip address which gets passed around in a manner decided by the cluster network protocol. E.g. if unit A is to ”have” the float-ip, it informs the network that it’s underlying mac addr now is tied to the float ip. If unit B ”takes over” the float ip, it informs the network that now it’s mac address is tied to the float ip. The cluster protocol specifies how this is agreed upon. Suggest reading up on the arp protocol, makes the process easier to grasp. — ErikE, Jun 27 '20 at 18:22
Also note that the terminology is not set in stone. Various implementations can use different words to describe the same thing, sometimes to the point of utter confusion. Knowing beforehand how the ethernet, arp, ip and tcp protocols interoperate (the distinct responsibilities of each) makes dechifering cluster network protocols easier. Case in point: one of the links in my answer names the floating ip as ”virtual ip” in its implementation. Virtual ip is a term used in a lot of different contexts, which may itself be a source of bewilderment. — ErikE, Jun 27 '20 at 18:30

DNS and failure tolerance strategies for load balancers

1 Answers1