0

We have 2 AWS regions in active mode. Services in Region-1 and Region-2 have health checks registered with Route53 which is setup for 'latency based routing'.

Here is the scenario:

  1. I access the public URL of a service using a Browser and it is getting served from Region-1
  2. Now I put my Region-1 down (to simulate latency).
  3. Route53 marks it unhealthy in a couple of minutes as expected. Now Region-2 is the active region.
  4. I continue accessing the public URL of service from browser but it keeps on failing for a good 15-18 minutes after which it starts hitting the right active region which is Region-2
  5. However, after step 3 if I open a new browser and hit the same URL, it goes to correct Region-2 as expected.

Upon investigation we found browsers caches DNS and socket records which does not expire until something like 15 minutes. The problem is, this is resulting in poor User experience despite the fact that we have 2 active AWS regions.

Expected behavior is that as soon as Region-1 goes down or is facing latency the subsequent requests from the same browser should go to the active Region ( or region with less latency).

Any comments on our expectations and any fixes/workaround except for deleting browser's DNS and Socket cache, which we cant expect our webapp users to perform?

user8898538
  • 11
  • 1
  • 6

0 Answers0