I'm trying to set up a recursive DNS server on a cloud-based VPS. If I set dnssec-validation no it works fine, but if I set dnssec-validation auto I get status: SERVFAIL from dig. However when I set up another DNS server on a different cloud provider with exactly the same config files, it works with dnssec-validation auto.
Both are running BIND 9.11.3 on Ubuntu 18.04.3. I've stripped named.conf.options down to just:
options {
directory "/var/cache/bind";
recursion yes;
allow-transfer { none; };
dnssec-validation auto;
auth-nxdomain no; # conform to RFC1035
listen-on { any; };
listen-on-v6 { none; };
};
named.conf.local is empty, and everything else is default. On the working server when I issue dig apple.com @localhost I get:
; <<>> DiG 9.11.3-1ubuntu1.9-Ubuntu <<>> apple.com @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7616
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 4, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: f987cd320c5efed7576fef585da668961c4f29e79a6cfa7b (good)
;; QUESTION SECTION:
;apple.com. IN A
;; ANSWER SECTION:
apple.com. 3600 IN A 17.142.160.59
apple.com. 3600 IN A 17.178.96.59
apple.com. 3600 IN A 17.172.224.47
;; AUTHORITY SECTION:
apple.com. 172800 IN NS c.ns.apple.com.
apple.com. 172800 IN NS b.ns.apple.com.
apple.com. 172800 IN NS d.ns.apple.com.
apple.com. 172800 IN NS a.ns.apple.com.
;; Query time: 401 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 15 19:47:18 CDT 2019
;; MSG SIZE rcvd: 181
But with the same command on the failing server I get:
; <<>> DiG 9.11.3-1ubuntu1.9-Ubuntu <<>> apple.com @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 50881
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 774ab4aac2a012bade665c985da668b77508d1e7bdf048f2 (good)
;; QUESTION SECTION:
;apple.com. IN A
;; Query time: 4000 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Oct 15 19:47:51 CDT 2019
;; MSG SIZE rcvd: 66
In /var/log/syslog on the failing server I see:
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 199.7.91.13#53
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 199.7.83.42#53
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 202.12.27.33#53
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.5.5.241#53
Oct 15 20:01:17 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:17 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.58.128.30#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.36.148.17#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 198.41.0.4#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 198.97.190.53#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 193.0.14.129#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.33.4.12#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.112.36.4#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 192.203.230.10#53
Oct 15 20:01:18 ns0 named[31690]: validating com/DS: no valid signature found
Oct 15 20:01:18 ns0 named[31690]: no valid RRSIG resolving 'com/DS/IN': 199.9.14.201#53
If I modify the config to say dnssec-validation no; then the failing server starts working.
Any ideas how to start debugging this?
EDIT: I've verified (with md5sum) that /etc/bind/bind.keys is exactly the same on both servers.
EDIT2: With querylogging turned on, I get this additional log line:
[...] query failed (SERVFAIL) for apple.com/IN/A at ../../../bin/named/query.c:8402