We wanted to see if using Route 53 would allow us to route traffic to our machines instead of the AWS ELB, just for experimentation, and we noticed the following.
We have some Nginx machines running under an AWS ELB. We also updated the public IPs of those machines in Route 53, and removed unhealthy ones immediately. We then added weighted routing so that half the traffic goes through the ELB, and the other half goes through Route 53 directly to those machines. We have Grafana monitoring enabled in all machines, so that means that we should be seeing almost double the traffic in Grafana as we see in the AWS ELB. That is not the case. We're seeing almost 80% of the traffic in the ELB still. We enabled this 24 hours ago. When we did not have this traffic split, and were sending traffic to the ELB alone, the Grafana dashboard and the AWS ELB were showing exactly the same traffic, leading us to believe that our monitoring pipeline was fine.
We did this exercise because we eventually want to remove the AWS ELB in place of our own, and we wanted to find out if our machines can handle things without the ELB, since now they will be connecting to clients directly. My questions right now are: are we losing traffic and how do we find out? Why does the ELB still show more than 50% traffic after a day?
Right now the ELB is showing 6.8M, but Grafana is showing 9M. Shouldn't Grafana be at least twice as high as the ELB, at least theoretically? I understand that the IPs must be getting cached by the ISP or the client, but in that scenario if the ELB machines go down, the traffic to those IPs must be getting lost? I'm a bit confused, and troubled.