Our domain is hosted with Enom. DNS records are managed under Enom's reseller for India called Pugmarks. We want to switch DNS record management service from Enom/Reseller over to AWS Route53, however, retaining Enom as the domain registrar.
TTL for domain's DNS records is at 300 (5 mins). I have checked TTL for name servers and found it to be 3600 (1 hr).
When we replaced Enom Name Servers with Route53 ones, Enom stopped resolving for the domain instantly. Following which ISP DNS servers followed suit after TTL expired. Our website traffic dropped (as observed in Google Analytic). This impact is understood.
A while later, upon querying for NS record for the domain through a Public/Open Name Servers such as: 4.2.2.2 -- 4.2.2.6 and 8.8.8.8 & 8.8.4.4, we get the updated records pointing to Route53: i.e
dig NS <domain.com> @8.8.4.4.
The above command shows Route53 name server records. Similarly, all other records successfully show up (A, CNAME etc.) indicating that Name Server change is successfully acquired by these DNS servers. At this point we observe US traffic scaling in Google Analytic.
But, Indian traffic still remains zero. I have queried a couple of DNS servers from two different Indian ISP (not-open to public/restricted to ISP users). These do not return any records. We waited for 4 hours for ISP to catch up with change of records, but in vain.
It is weird that US region was able to get new records, while none of the Indian ISP we tried (at least 5 of them) could pick the change. Every other DNS test tools on the web was able to pick the change except the ISP here. Resulting in a big dip in traffic which is a major concern since it is the audience that the site targets.
After 4 hours of wait-&-watch, we switched the entries back to Enom Name Servers. In matter of seconds, Indian ISP was able to resolve records, as if it was always querying Enom servers for records, even though TTL is for 1 hr. (Route53 would continue to resolve, so US traffic remained unchanged)
I have two doubts:
- Indian ISP is caching NS for the domain for more than 1 hr, probably for 48 hrs
- Some issue pertaining to Indian region that I have no clue about.
Point 1 is a prime suspect as far as I am concerned. Here is a link that gives details about the domain. It shows parent name server as 48 hr TTL while local name server is 1 hr TTL. Could this be causing the issue?
I want to move DNS management over to Route53 and I cannot have a downtime for over 6 hrs. We have tried up to 4 hrs in vain.
Why is this happening and what is the way out?
One alternative, perhaps, is keeping all its DNS records to 49hrs TTL (TTL greater than TTL for NS record at parent) and then switch Name Servers after record propagation of this TTL change. However, it is not foolproof, can be tried though.