We have a 2 node multi subnet cluster. When we execute a failover of one of the cluster resource groups we find the DNS update behaviour to be unexpected.
- NodeA and NodeB are each located in different datacenters: SiteA and SiteB.
- SiteA and SiteB are also different Active directory Sites.
- There is one active directory forest
- DNS01 is DC and DNS server located in SiteA, DNS02 is DC and DNS server located in SiteB,
- registerAlProviderIP is set to 0 for all Network Resources since application can't take MultiSubnetFailover=true as a parameter in their connection string.
- NodeA and NodeB both have DNS01 set as there primary DNS and DNS02 set as their secondary DNS in there NIC IPv4 config.
- Ipv6 is disabled
- SiteA is the primary site, all applications and users reside in SiteA. SiteB is only for disaster.
During a failover of one of the network cluster resources, clients experienced connection timeouts of up to 15 minutes. Further investigation revealed the following behaviour:
- during the failover, the cluster updates the A host record on DNS02 with the new IP address.
- Since DNS02 and DNS01 reside in different AD sites, the replication to DNS01 based on a inter site repplication schedule. (which is 15 minutes)
- Most clients we tested with, had DNS01 as there primary DNS server and hence, during the 15 minutes it took replication to update DNS01, they experienced timeout issues.
Fixing the 15 minute delay can be achieved with enabling "change Notification" on the link object. Which we have done, and this works. However, for us, since SiteA is the primary site and SiteB is only used as a disaster backup site, it would be best if the cluster just updates DNS01 instead of DNS02.
Our expectation was that either DNS01 would always be used since it's the default DNS server if we look at the SOA records. Or, that when we failover from SiteA to SiteB DNS02 would be used and if we fail back from SiteB to SiteA DNS01 would be used. (Since it makes sense that the node that is becoming the active node for the network resource would update the DNS server that resides in the same AD site.)
Questions:
We are looking for some indepth knowlegde on the DNS update mechanism.
- Why is the cluster always updating to DNS02 by default? If we shutdown DNS02, the cluster updates to DNS01, but if we start DNS02 back up, the cluster starts using DNS02 as the default DNS server to use to update the A host records.
- Is this something we can configure?
- Is the DNS update mechanism different for multisubnet clusters with all nodes residing in the same AD site?
- Are there any logs that provided detailed verbose steps regarding these DNS update steps?