0

recently I learned how to setup load balancer using aws and direct it to root / sub domain.

But while playing around with it, I figured there are some delays. For example, my settings are

3 instances, using nginx, node, with pm2

I changed the html content for each instances such as <h1>load 1</h1> <h1>load 2</h1> <h1>load 3</h1>

then when I keep reloading the load balancer dns, it does show the instances are switching randomly so I know that it's working.

then I went into one of the instances, stop the pm2

I went to refresh my page again, sometimes it would show gateway error which I believe it's because it went into the instance that I stopped, sometimes it would show the page BUT with super bad css that when I open console.log, there will be tones of errors saying cannot locate file.

Takes like 1-2 minutes until everything is totally fine.

I am wondering if this is normal and how it should happen? Or there is a way to optimise it in order for user to have a better user experience?

Thanks for any advise.

Dora
  • 341
  • 1
  • 5
  • 15
  • *"then when I keep reloading the load balancer dns, it does show the instances are switching randomly so I know that it's working."* Note that this isn't what that means. There isn't a correlation between the IP addresses you see in DNS and which specific instance will receive requests. The alternating addresses are the (invisible to you) *balancer instances*. They don't map 1:1 to your EC2 instances. With just one EC2 instance on the balancer, the DNS behavior doesn't change. – Michael - sqlbot Jan 19 '18 at 21:36
  • @Michael-sqlbot maybe I should say reloading the url provided by load balancer? something like `balancer-123033xxxx.us-west-1.elb.amazonaws.com ` – Dora Jan 19 '18 at 21:49
  • What I'm saying here is that the alternating DNS records does not yell you anything about the instances behind the balancer. Even with a balancer with only one instance behind it, you will tend to see 2 or more addresses, alternating in the DNS response. – Michael - sqlbot Jan 19 '18 at 21:54

2 Answers2

3

When you stopped your instance, two problems appeared:

  1. You received gateway errors because thr page request was redirected to the failed instance.
  2. You received a page but bad css. This happened because the page request went to a good instance but the subsequent request for the css file went to the bad instance.

This is simply how ELB works. The ELB uses health checks to know if your instances are bad. But that takes time. The 1 to 2 minutes you are witnessing is that time where ELB is determining whether the instance is failed or not.

You can configure the frequency of health checks in your ELB configuration.

Matt Houser
  • 10,053
  • 1
  • 28
  • 28
1

When you configure a load balancer, health checks are involved. Detection of a failed node is not instant, the health check has to fail first, then the node is removed from the load balancer.

You can set the health checks to be very short, but this can make the situation worse. If your site is very busy and one instance fails a health check, the instance is removed and then the other instances have even more work to do, they start failing health checks and like dominos your site comes down.

John Hanley
  • 4,754
  • 1
  • 11
  • 21