What's the magic incantation to force DNS queries over TCP on an Amazon/AMI image?

Question

It looks as though the resolv.conf option use-vc is being ignored on an Amazon AMI (latest 2016.09 version). Consider the following:

[hadoop@ip-172-20-40-202 ~]$ cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options use-vc ndots:5 timeout:2 attempts:5
nameserver 172.20.53.184
nameserver 172.20.0.2

If I use nslookup interactively, forcing use of TCP via set vc, queries work exactly as expected:

[hadoop@ip-172-20-40-202 ~]$ nslookup
> set vc
> kafka.default.svc.cluster.local
;; Got recursion not available from 172.20.53.184, trying next server
;; Got recursion not available from 172.20.53.184, trying next server
;; Got recursion not available from 172.20.53.184, trying next server
Server:     172.20.53.184
Address:    172.20.53.184#53

Name:   kafka.default.svc.cluster.local
Address: 100.96.14.2
Name:   kafka.default.svc.cluster.local
Address: 100.96.7.2
Name:   kafka.default.svc.cluster.local
Address: 100.96.13.2
> kafka
Server:     172.20.53.184
Address:    172.20.53.184#53

Name:   kafka.default.svc.cluster.local
Address: 100.96.14.2
Name:   kafka.default.svc.cluster.local
Address: 100.96.7.2
Name:   kafka.default.svc.cluster.local
Address: 100.96.13.2
> exit

However, if left to its own, nslookup fails:

[hadoop@ip-172-20-40-202 ~]$ nslookup kafka.default.svc.cluster.local
Server:     172.20.0.2
Address:    172.20.0.2#53

** server can't find kafka.default.svc.cluster.local: NXDOMAIN

Same with dig. Forcing TCP works as expected:

[hadoop@ip-172-20-40-202 ~]$ dig +vc kafka.default.svc.cluster.local

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.47.rc1.52.amzn1 <<>> +vc kafka.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55634
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;kafka.default.svc.cluster.local. IN    A

;; ANSWER SECTION:
kafka.default.svc.cluster.local. 30 IN  A   100.96.13.2
kafka.default.svc.cluster.local. 30 IN  A   100.96.14.2
kafka.default.svc.cluster.local. 30 IN  A   100.96.7.2

;; Query time: 2 msec
;; SERVER: 172.20.53.184#53(172.20.53.184)
;; WHEN: Thu Mar 16 20:45:06 2017
;; MSG SIZE  rcvd: 97

And not forcing TCP fails:

[hadoop@ip-172-20-40-202 ~]$ dig kafka.default.svc.cluster.local

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.47.rc1.52.amzn1 <<>> kafka.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 9580
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;kafka.default.svc.cluster.local. IN    A

;; AUTHORITY SECTION:
.           52  IN  SOA a.root-servers.net. nstld.verisign-grs.com. 2017031602 1800 900 604800 86400

;; Query time: 0 msec
;; SERVER: 172.20.0.2#53(172.20.0.2)
;; WHEN: Thu Mar 16 20:44:58 2017
;; MSG SIZE  rcvd: 124

It appears as though use-vc in the line options use-vc ndots:5 timeout:2 attempts:5 is being ignored.

How do I get my configuration correct to force the use of TCP to be used for all DNS queries? man resolv.conf says it should work!

Unrelated to your question, is there a particular reason why you are forcing DNS lookups to use TCP? — Andrew B, Mar 16 '17 at 21:05
I knew that question was coming. It's because AWS elastic load balancers don't support UDP, and that's what's in front of the DNS service that I'm querying. — Matthew Adams, Mar 16 '17 at 21:06
What possible motivation is there for someone to load balance a DNS service with an ELB? ...and why are you not using the built-in VPC resolver? — Michael - sqlbot, Mar 17 '17 at 01:31
@Michael-sqlbot The DNS service is `kube-dns`, provided by Kubernetes, whose purpose is to resolve Kubernetes service DNS names into pod IPs from machines that are *not* part of Kubernetes but live in the same subnet. If the non-Kubernetes machine has a route to the pod (by using `sudo ip route add via `), then the non-Kubernetes machine can communicate with Kubernetes services. Having said that, tell me more about how you'd use the built-in VPC resolver. — Matthew Adams, Mar 17 '17 at 13:22

Matthew Adams · Answer 1 · 2017-03-17T13:24:41.223

It looks like the diagnostic tools, nslookup & dig, were misleading me.

When I used getent, I saw that names were indeed resolving correctly and honoring the use-vc option in /etc/resolv.conf:

[hadoop@ip-172-20-40-202 ~]$ getent ahosts kafka.default.svc.cluster.local
100.96.13.2     STREAM kafka.default.svc.cluster.local
100.96.13.2     DGRAM
100.96.13.2     RAW
100.96.14.2     STREAM
100.96.14.2     DGRAM
100.96.14.2     RAW
100.96.7.2      STREAM
100.96.7.2      DGRAM
100.96.7.2      RAW
[hadoop@ip-172-20-40-202 ~]$ getent hosts kafka.default.svc.cluster.local
100.96.13.2     kafka.default.svc.cluster.local
100.96.14.2     kafka.default.svc.cluster.local
100.96.7.2      kafka.default.svc.cluster.local

If I remove the use-vc option in /etc/resolv.conf, getent borks as expected.

Who knew, right?

Good catch. `dig` and `nslookup` use their own built-in resolver implementations. They scrape server IP addresses from /etc/resolv.conf, but that's it. This is the same reason why those tools won't resolve entries from `/etc/hosts`. In both scenarios, `getent hosts` is better when you're trying to emulate what an application will see when performing calls to `getaddrinfo()` and such. — Andrew B, Mar 16 '17 at 22:23

What's the magic incantation to force DNS queries over TCP on an Amazon/AMI image?

1 Answers1