0

We wanted to see if using Route 53 would allow us to route traffic to our machines instead of the AWS ELB, just for experimentation, and we noticed the following.

We have some Nginx machines running under an AWS ELB. We also updated the public IPs of those machines in Route 53, and removed unhealthy ones immediately. We then added weighted routing so that half the traffic goes through the ELB, and the other half goes through Route 53 directly to those machines. We have Grafana monitoring enabled in all machines, so that means that we should be seeing almost double the traffic in Grafana as we see in the AWS ELB. That is not the case. We're seeing almost 80% of the traffic in the ELB still. We enabled this 24 hours ago. When we did not have this traffic split, and were sending traffic to the ELB alone, the Grafana dashboard and the AWS ELB were showing exactly the same traffic, leading us to believe that our monitoring pipeline was fine.

We did this exercise because we eventually want to remove the AWS ELB in place of our own, and we wanted to find out if our machines can handle things without the ELB, since now they will be connecting to clients directly. My questions right now are: are we losing traffic and how do we find out? Why does the ELB still show more than 50% traffic after a day?

Right now the ELB is showing 6.8M, but Grafana is showing 9M. Shouldn't Grafana be at least twice as high as the ELB, at least theoretically? I understand that the IPs must be getting cached by the ISP or the client, but in that scenario if the ELB machines go down, the traffic to those IPs must be getting lost? I'm a bit confused, and troubled.

1 Answers1

0

Route53, being a DNS service, doesn't control how much traffic, and where it flows to.

Think of it this way, you have 2 clients - Client1 and Client2. Client1 queries for yourdomain.com and receives IP address 10.100.10.20, Client1 then downloads 100 MBs over 30 consecutive connections.

Now, Client2 queries for yourdomain.com and receives IP address 10.100.10.30, Client2 then downloads 25 MBs over 50 consecutive connections.

In this case, Route53 "load balanced" the DNS requests across two servers evenly. But, the each client pulled different amounts of data and made different numbers of connections to do so. So now your load balancer shows 100MB of traffic and 30 connections, and Grafana shows 125Mb of traffic and 80 connections.

If you want to truly load balance traffic, then you need a load balancer. That is what ELB does. Route53 doesn't do that, it only load balances DNS requests.

Appleoddity
  • 3,488
  • 2
  • 13
  • 33
  • Thanks for the explanation, @Appleoddity. I had one more folllowup question. Could this behaviour also be because of ELB instance public IPs becoming cached by the client? I've heard that clients and ISPs sometimes override domain TTLs. Could this explain why traffic still goes to the ELB even after Route 53 DNS changes? – Ayush Sharma Sep 06 '17 at 06:53
  • I suppose it is possible that an ISP or client would override the TTL, but this would violate the standards. I imagine this might happen in extreme cases where someone sets an unusually small TTL and the ISP or client enforces a minimum TTL. Anything 5 minutes or larger I think would not be an issue though. Yes, the IP will be cached until the TTL expires. But, also, don't forget that just because the DNS records changed it doesn't mean traffic will stop. The IP is still reachable from the internet, and it could be something as trivial as a hacker or something scanning the IP. – Appleoddity Sep 07 '17 at 03:52