1

This morning, we found that (due to a changeover) one of our DNS records to an important service is wrong. It has been changed on our primary DNS server, but clients at secondary sites do not see the change. (Our network is running almost entirely with OS X 10.5 Servers and OS X 10.5 clients).

Let me name some machines for example purposes:

  • primary = the primary DNS server
  • secondary = a secondary DNS server
  • client = a client at a secondary site
  • service.ourdomain.com = the service whose DNS records have changed

On the client, (which does DNS lookups through secondary), when probing how things are configured, I get:

nslookup service.ourdomain.com
** server can't find service.ourdomain.com: NXDOMAIN

nslookup service.ourdomain.com secondary
** server can't find service.ourdomain.com: NXDOMAIN

nslookup service.ourdomain.com primary
(returns appropriate information about how to contact the service)

When I ssh into

  • secondary, which does its DNS lookups through primary
  • or primary itself, which does DNS lookups from itself

I get:

nslookup service.ourdomain.com
(returns appropriate information about how to contact the service)

nslookup service.ourdomain.com secondary
** server can't find service.ourdomain.com: NXDOMAIN

nslookup service.ourdomain.com primary
(returns appropriate information about how to contact the service)

I'm perplexed. Secondary seems to know where the service is, but does not return the values when queried. (Granted, the DNS entries it can be entirely independent or what it returns when queried for a domain name, but still -- it looks like it should know!)

I have tried flushing the DNS on secondary and on client. (dscacheutil -flushcache). I have also stopped and restarted DNS on secondary. (sudo serveradmin stop dns and sudo serveradmin start dns)

At our primary site, my coworker rebooted primary and a client there to get the name to resolve right. Unfortunately, we have 14 secondary sites, and I'd rather not reboot the servers, which are sharing files, during the day if possible, but will do it if it solves the problem.


Per request:

host -C ourdomain.com   # [with names substituted]:
ourdomain.com SOA record primary.ourdomain.com. admin.ourdomain.com. 2009121410 21600 3600 604800 345600

[I have no idea what admin.ourdomain.com is -- I don't believe we have a box by that name; I sure can't ping it. The primary DNS server shows up right, though.]


Also per request, here is the output of dig service.ourdomain.com @secondary (with name substitutions):

; <<>> DiG 9.4.3-P1 <<>> service.ourdomain.com @secondary
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 19207
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;service.ourdomain.com. IN  A

;; AUTHORITY SECTION:
ourdomain.com.      10800   IN  SOA primary.ourdomain.com. admin.ourdomain.com. 2009121409 21600 3600 604800 345600

;; Query time: 3 msec
;; SERVER: [IP of secondary]#53([IP of secondary])
;; WHEN: Mon Dec 14 10:34:11 2009
;; MSG SIZE  rcvd: 88

And the output of dig service.ourdomain.com @primary:

; <<>> DiG 9.4.3-P1 <<>> service.ourdomain.com @primary
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47885
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;service.ourdomain.com. IN  A

;; ANSWER SECTION:
service.ourdomain.com. 10800    IN  A   [IP of service]

;; AUTHORITY SECTION:
ourdomain.com.      10800   IN  NS  primary.ourdomain.com.

;; ADDITIONAL SECTION:
primary.ourdomain.com.  10800   IN  A   [IP of primary]

;; Query time: 8 msec
;; SERVER: [IP of primary]#53([IP of primary])
;; WHEN: Mon Dec 14 10:34:18 2009
;; MSG SIZE  rcvd: 92

The most striking differences are that secondary did not reply with an answer, and that the primary said, ";; WARNING: recursion requested but not available".

Clinton Blackmore
  • 3,520
  • 6
  • 36
  • 61
  • Not familiar with MACOSX Serers but I would use dig service.ourdomain.com @NS_Server_IP to confirm it is returning the right result. If your secondary is still reporting incorrectly it has the wrong result cached. Just for the future, make sure to take TTL down to 5 min or so before switching DNS entries, it makes changing IPs easier :) – Dave Drager Dec 14 '09 at 16:27
  • 1
    Good news: now that it has been a couple of hours, the change has propagated by itself. I still want to know how to propagate things more quickly in the future, though. – Clinton Blackmore Dec 14 '09 at 17:21

3 Answers3

2

Without knowing your configuration, I would guess this is a caching issue, or a DNS propagation issue.

Without knowing your domain you're using, I can't really test it from here. I personally don't understand why people omit this sort of relevant information, but they often do.

  • Try "host -C yourdomain.com" and tell me what you see. If you see different SOA records with different serial numbers, then you need to fix your DNS propagation. IF the secondary is not listed in the NS records for this zone, add an "also-notify" line if running BIND.

  • Try changing the serial number on the master to ensure it was properly changed, as well as to test propagation.

  • Try setting a better negative cache time that is much smaller, say 600 (10 minutes) or so. This is one of the values in the SOA record.

  • Try a "dig hostname.yourdomain.com @secondaryserver" and see what it returns. Do the same on primary. If they differ, that is the brokenness.

  • If each of these sites that are returning bad data have a huge cache time, you should be able to ssh to them and simply restart the name server, not reboot each site fully. BIND will quickly restart if that is what is in use.

Michael Graff
  • 6,668
  • 1
  • 24
  • 36
  • I have added the results of host -C mydomain.com to my post. The numbers look the same when I do it on client, secondary, and primary. (Is the first number the serial number?) The secondary service is not listed in the chain. Mac OS X provides a graphical tool for configuring services, such as DNS, which does not make it at all clear how things map onto config files. [I have not configured DNS at a low level, and my co-workers might be upset if I were to muck with it, as then changes made graphically may or may not stomp on changes made in config files.] – Clinton Blackmore Dec 14 '09 at 17:33
  • The dig results differ. – Clinton Blackmore Dec 14 '09 at 17:35
  • [It looks like I'm going to be away from this problem for a bit -- I intend to post something of the dig results in my question.] – Clinton Blackmore Dec 14 '09 at 17:40
  • If the dig results differ, then you might simply need to bump you serial number on the master and tell it to reload the zone. It sounds like DNS data was changed without bumping the serial number, which means changes would never propagate. btw, the "admin.ourdomain.com" really maps into admin@ourdomain.com -- it is the contact field, and this is the conversion performed. – Michael Graff Dec 14 '09 at 17:43
  • I think I've found how to bump it and flush it -- http://discussions.apple.com/thread.jspa?messageID=10738953 -- I'll add a note to my todo list to try next week when school is out. – Clinton Blackmore Dec 14 '09 at 22:50
  • Oh, I forgot to mention that I posted the results from the dig commands. – Clinton Blackmore Dec 14 '09 at 23:20
  • When you change data, you MUST ALSO change the serial number in the SOA record. If your tool does not do this for you, you have to manually do it. master/slave sync is done almost entirely based upon the serial number. Additionally, a 10080 TTL is 2.8 hours. Try making it 3600 or 1800. – Michael Graff Dec 14 '09 at 23:54
  • BTW, your discussion link from the apple site is close. The serial number must always increment for each change you make. If you do not, you will never get reliable propagation. So, just remember to change the serial. It need not be the date format, although it is common. I just add one to my number. Once you hit 2147483648, you need to start over again at 1. :) – Michael Graff Dec 16 '09 at 09:37
  • I can only assume that the apple tool bumps things every time you update it, because we've successfully (with a 3 hour delay) made DNS changes and have never had to manually bump up the number. I will have to see if I can find out how to change the TTL. – Clinton Blackmore Dec 16 '09 at 17:21
2

Your secondary server is trying to recursively answer (RD - recursion desired, RA - recursion available) but failing (NXDOMAIN) whilst at the same time also serving the SOA record authoritatively (AA - authoritative answer).

You do seem to have a slightly odd mix here... we need to establish how it is that your secondary server knows about the zone (the SOA record) but doesn't know about the record within the zone.

I'd go with Michael's recommendation - bump the serial number on the master, and then restart BIND on the secondary to ensure that its cache is cleared.

Alnitak
  • 21,191
  • 3
  • 52
  • 82
1

You can manually force a zone transfer using the rndc utility. Run this command on all of your secondary DNS servers:

rndc -p 54 retransfer mydomain.example.com IN com.apple.ServerAdmin.DNS.public

You can also use this utility to reload your configuration without restarting named.

rndc -p 54 reload
lukecyca
  • 2,205
  • 13
  • 20