This morning, we found that (due to a changeover) one of our DNS records to an important service is wrong. It has been changed on our primary DNS server, but clients at secondary sites do not see the change. (Our network is running almost entirely with OS X 10.5 Servers and OS X 10.5 clients).
Let me name some machines for example purposes:
- primary = the primary DNS server
- secondary = a secondary DNS server
- client = a client at a secondary site
- service.ourdomain.com = the service whose DNS records have changed
On the client, (which does DNS lookups through secondary), when probing how things are configured, I get:
nslookup service.ourdomain.com
** server can't find service.ourdomain.com: NXDOMAIN
nslookup service.ourdomain.com secondary
** server can't find service.ourdomain.com: NXDOMAIN
nslookup service.ourdomain.com primary
(returns appropriate information about how to contact the service)
When I ssh into
- secondary, which does its DNS lookups through primary
- or primary itself, which does DNS lookups from itself
I get:
nslookup service.ourdomain.com
(returns appropriate information about how to contact the service)
nslookup service.ourdomain.com secondary
** server can't find service.ourdomain.com: NXDOMAIN
nslookup service.ourdomain.com primary
(returns appropriate information about how to contact the service)
I'm perplexed. Secondary seems to know where the service is, but does not return the values when queried. (Granted, the DNS entries it can be entirely independent or what it returns when queried for a domain name, but still -- it looks like it should know!)
I have tried flushing the DNS on secondary and on client. (dscacheutil -flushcache
). I have also stopped and restarted DNS on secondary. (sudo serveradmin stop dns
and sudo serveradmin start dns
)
At our primary site, my coworker rebooted primary and a client there to get the name to resolve right. Unfortunately, we have 14 secondary sites, and I'd rather not reboot the servers, which are sharing files, during the day if possible, but will do it if it solves the problem.
Per request:
host -C ourdomain.com # [with names substituted]:
ourdomain.com SOA record primary.ourdomain.com. admin.ourdomain.com. 2009121410 21600 3600 604800 345600
[I have no idea what admin.ourdomain.com is -- I don't believe we have a box by that name; I sure can't ping it. The primary DNS server shows up right, though.]
Also per request, here is the output of dig service.ourdomain.com @secondary
(with name substitutions):
; <<>> DiG 9.4.3-P1 <<>> service.ourdomain.com @secondary
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 19207
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; QUESTION SECTION:
;service.ourdomain.com. IN A
;; AUTHORITY SECTION:
ourdomain.com. 10800 IN SOA primary.ourdomain.com. admin.ourdomain.com. 2009121409 21600 3600 604800 345600
;; Query time: 3 msec
;; SERVER: [IP of secondary]#53([IP of secondary])
;; WHEN: Mon Dec 14 10:34:11 2009
;; MSG SIZE rcvd: 88
And the output of dig service.ourdomain.com @primary
:
; <<>> DiG 9.4.3-P1 <<>> service.ourdomain.com @primary
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47885
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;service.ourdomain.com. IN A
;; ANSWER SECTION:
service.ourdomain.com. 10800 IN A [IP of service]
;; AUTHORITY SECTION:
ourdomain.com. 10800 IN NS primary.ourdomain.com.
;; ADDITIONAL SECTION:
primary.ourdomain.com. 10800 IN A [IP of primary]
;; Query time: 8 msec
;; SERVER: [IP of primary]#53([IP of primary])
;; WHEN: Mon Dec 14 10:34:18 2009
;; MSG SIZE rcvd: 92
The most striking differences are that secondary did not reply with an answer, and that the primary said, ";; WARNING: recursion requested but not available".