1

I'm trying to set up a recursive DNS server on a cloud-based VPS. If I set dnssec-validation no it works fine, but if I set dnssec-validation auto I get status: SERVFAIL from dig. However when I set up another DNS server on a different cloud provider with exactly the same config files, it works with dnssec-validation auto.

Both are running BIND 9.11.3 on Ubuntu 18.04.3. I've stripped named.conf.options down to just:

options {
    directory "/var/cache/bind";
    recursion yes;
    allow-transfer { none; };
    dnssec-validation auto;
    auth-nxdomain no;    # conform to RFC1035
    listen-on { any; };
    listen-on-v6 { none; };
};

named.conf.local is empty, and everything else is default. On the working server when I issue dig apple.com @localhost I get:

; <<>> DiG 9.11.3-1ubuntu1.9-Ubuntu <<>> apple.com @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7616
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: f987cd320c5efed7576fef585da668961c4f29e79a6cfa7b (good)
;; QUESTION SECTION:
;apple.com.                     IN      A

;; ANSWER SECTION:
apple.com.              3600    IN      A       17.142.160.59
apple.com.              3600    IN      A       17.178.96.59
apple.com.              3600    IN      A       17.172.224.47

;; AUTHORITY SECTION:
apple.com.              172800  IN      NS      c.ns.apple.com.
apple.com.              172800  IN      NS      b.ns.apple.com.
apple.com.              172800  IN      NS      d.ns.apple.com.
apple.com.              172800  IN      NS      a.ns.apple.com.

;; Query time: 401 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 15 19:47:18 CDT 2019
;; MSG SIZE  rcvd: 181

But with the same command on the failing server I get:

; <<>> DiG 9.11.3-1ubuntu1.9-Ubuntu <<>> apple.com @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 50881
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 774ab4aac2a012bade665c985da668b77508d1e7bdf048f2 (good)
;; QUESTION SECTION:
;apple.com.                     IN      A

;; Query time: 4000 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 15 19:47:51 CDT 2019
;; MSG SIZE  rcvd: 66

In /var/log/syslog on the failing server I see:

Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 199.7.91.13#53
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 199.7.83.42#53
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 202.12.27.33#53
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.5.5.241#53
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.58.128.30#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.36.148.17#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 198.41.0.4#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 198.97.190.53#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 193.0.14.129#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.33.4.12#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.112.36.4#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.203.230.10#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 199.9.14.201#53

If I modify the config to say dnssec-validation no; then the failing server starts working.

Any ideas how to start debugging this?

EDIT: I've verified (with md5sum) that /etc/bind/bind.keys is exactly the same on both servers.

EDIT2: With querylogging turned on, I get this additional log line:

 [...] query failed (SERVFAIL) for apple.com/IN/A at ../../../bin/named/query.c:8402
ras
  • 23
  • 1
  • 6
  • Do you filter TCP? Getting data with signatures make packets bigger and sometimes will be over typical UDP buffer size, hence a retry over TCP (or with an EDNS option). If this retry does not get through then signatures can not be validated. You should monitor the external DNS traffic of the failing nameserver to see exactly what requests it does and what replies it gets, in order to understand why he does not get RRSIG records for the DS records of `.com`. – Patrick Mevzek Oct 17 '19 at 04:52
  • I was sure this was going to turn out to be the problem! But alas, TCP isn't being filtered. I've got querylogging turned on, but it doesn't log responses beyond saying that it failed. I'm not sure how to capture more detail. – ras Oct 19 '19 at 16:00
  • Dump the traffic to upstream DNS on both servers and compare. – gxx Oct 19 '19 at 16:56
  • Do you use forwarders? Did you try with another bind version? – Patrick Mevzek Oct 26 '19 at 21:53

0 Answers0