13

I am studying about scalability design and I've having a hard time thinking of ways to ensure a load balancer does not become a single point of failure. If a load balancer goes down, who makes the decision to route to a back up load balancer? What if that "decision maker" goes down too?

aw626
  • 235
  • 2
  • 7
  • Well - what if every single site behind the load balancer goes down too? If everything breaks, it doesn't matter how much scalability or redundancy you have. – Allan S. Hansen Feb 24 '14 at 07:36
  • Maybe I didn't make myself clear. The point is single pointer of failure. Can you explain how a load balancer is not a single point of failure? – aw626 Feb 24 '14 at 07:41
  • You do that as you suggested yourself; you add redundancy. But if _everything_ breaks, no amount of redundancy will save you. – Allan S. Hansen Feb 24 '14 at 07:51
  • So in the real world, there will come to a point where it is a single point of failiure? – aw626 Apr 13 '14 at 17:02
  • @aw626 -- I understand it has been a while since you asked this (very pertinent) question. What is not addressed by answers here is that the (front) load-balancer is the piece of hardware that receives the request from the outside world. I found the answers in Quora (asked after you asked in SO) more satisfying than the answers (or comments) you received here: https://www.quora.com/Can-the-load-balancer-becomes-the-single-point-of-failure-of-a-large-distributed-system – Happy Green Kid Naps Feb 03 '21 at 21:01

2 Answers2

0

The point in avoiding a load balancer as a single point of failure is the load balancer(s) will run in a high availability cluster with hardware backup.

0

I believe the answer to this question is redundancy.

The load balancer, instead of being a single computer/service/module/whatever, should be several instances of that computer/service/whatever.

The clients should be aware of the options they have in case their favorite load balancer goes down.
In case a client is timing out on their favorite load balancer, they already have the logic of how to access the next one.

This is the most straight forward way I can think of to get rid of single points of failure, but I'm sure there are many others that have been researched.


Note that any system component is a single point of failure, no matter how much redundancy you put in. The question is: "How sure do you want to be that it will not go down?"
if the probability for a single instance to go down is p, then the probability for n instances to all go down together (assuming they are independent) is p^n. Pick how sure you want to be, or how much resources you can pay, and get the other side of the equation.

Gulzar
  • 23,452
  • 27
  • 113
  • 201