0

We have a round-robin DNS setup that map the same name ("gsx") to six different IP addresses. This is verified as working from the client by using things such as 'nslookup' and 'ping'. However when we attempt to NFS mount using the name ("gsx") it will only mount from a single system.

If we specifically mount via IP address, we can mount from any of the 6 servers, so all 6 servers are correctly exporting and servicing NFS requests. The problem seems to lie somewhere in the mount.nfs addressing, or whatever it relies upon.

The client in question is a Linux system running CentOS 6.8.

To illustrate the issue please see the following commands which illustrate [1] nslookup correctly rotates among the addresses on the client; [2] ping correctly rotates among the addresses on the client; [3] mount -t nfs does NOT correctly rotate among the addresses on the client, but instead will always use the same address:

[1] nslookup test

for i in 1 2 3 4 5 6 ; do nslookup gsx; done

Server:         192.168.1.19
Address:        192.168.1.19#53

Name:   gsx.backbone.lan
Address: 192.168.10.16
Name:   gsx.backbone.lan
Address: 192.168.10.11
Name:   gsx.backbone.lan
Address: 192.168.10.12
Name:   gsx.backbone.lan
Address: 192.168.10.13
Name:   gsx.backbone.lan
Address: 192.168.10.14
Name:   gsx.backbone.lan
Address: 192.168.10.15

Server:         192.168.1.19
Address:        192.168.1.19#53
Name:   gsx.backbone.lan
Address: 192.168.10.11
Name:   gsx.backbone.lan
Address: 192.168.10.12
Name:   gsx.backbone.lan
Address: 192.168.10.13
Name:   gsx.backbone.lan
Address: 192.168.10.14
Name:   gsx.backbone.lan
Address: 192.168.10.15
Name:   gsx.backbone.lan
Address: 192.168.10.16

Server:         192.168.1.19
Address:        192.168.1.19#53

Name:   gsx.backbone.lan
Address: 192.168.10.12
Name:   gsx.backbone.lan
Address: 192.168.10.13
Name:   gsx.backbone.lan
Address: 192.168.10.14
Name:   gsx.backbone.lan
Address: 192.168.10.15
Name:   gsx.backbone.lan
Address: 192.168.10.16
Name:   gsx.backbone.lan
Address: 192.168.10.11

Server:         192.168.1.19
Address:        192.168.1.19#53
Name:   gsx.backbone.lan
Address: 192.168.10.13
Name:   gsx.backbone.lan
Address: 192.168.10.14
Name:   gsx.backbone.lan
Address: 192.168.10.15
Name:   gsx.backbone.lan
Address: 192.168.10.16
Name:   gsx.backbone.lan
Address: 192.168.10.11
Name:   gsx.backbone.lan
Address: 192.168.10.12

Server:         192.168.1.19
Address:        192.168.1.19#53

Name:   gsx.backbone.lan
Address: 192.168.10.14
Name:   gsx.backbone.lan
Address: 192.168.10.15
Name:   gsx.backbone.lan
Address: 192.168.10.16
Name:   gsx.backbone.lan
Address: 192.168.10.11
Name:   gsx.backbone.lan
Address: 192.168.10.12
Name:   gsx.backbone.lan
Address: 192.168.10.13

Server:         192.168.1.19
Address:        192.168.1.19#53
Name:   gsx.backbone.lan
Address: 192.168.10.15
Name:   gsx.backbone.lan
Address: 192.168.10.16
Name:   gsx.backbone.lan
Address: 192.168.10.11
Name:   gsx.backbone.lan
Address: 192.168.10.12
Name:   gsx.backbone.lan
Address: 192.168.10.13
Name:   gsx.backbone.lan
Address: 192.168.10.14

[2] ping example

for i in 1 2 3 4 5 6 ; do ping -c3 gsx; done

PING gsx.backbone.lan (192.168.10.16) 56(84) bytes of data.
64 bytes from gsx.backbone.lan (192.168.10.16): icmp_seq=1 ttl=64 time=0.065 ms
64 bytes from gsx.backbone.lan (192.168.10.16): icmp_seq=2 ttl=64 time=0.093 ms
64 bytes from gsx.backbone.lan (192.168.10.16): icmp_seq=3 ttl=64 time=0.063 ms

--- gsx.backbone.lan ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.063/0.073/0.093/0.016 ms
PING gsx.backbone.lan (192.168.10.11) 56(84) bytes of data.
64 bytes from gsx.backbone.lan (192.168.10.11): icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from gsx.backbone.lan (192.168.10.11): icmp_seq=2 ttl=64 time=0.089 ms
64 bytes from gsx.backbone.lan (192.168.10.11): icmp_seq=3 ttl=64 time=0.061 ms

--- gsx.backbone.lan ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.061/0.071/0.089/0.014 ms
PING gsx.backbone.lan (192.168.10.12) 56(84) bytes of data.
64 bytes from gsx.backbone.lan (192.168.10.12): icmp_seq=1 ttl=64 time=0.133 ms
64 bytes from gsx.backbone.lan (192.168.10.12): icmp_seq=2 ttl=64 time=0.124 ms
64 bytes from gsx.backbone.lan (192.168.10.12): icmp_seq=3 ttl=64 time=0.061 ms

--- gsx.backbone.lan ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.061/0.106/0.133/0.032 ms
PING gsx.backbone.lan (192.168.10.13) 56(84) bytes of data.
64 bytes from gsx.backbone.lan (192.168.10.13): icmp_seq=1 ttl=64 time=0.080 ms
64 bytes from gsx.backbone.lan (192.168.10.13): icmp_seq=2 ttl=64 time=0.090 ms
64 bytes from gsx.backbone.lan (192.168.10.13): icmp_seq=3 ttl=64 time=0.060 ms
--- gsx.backbone.lan ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.060/0.076/0.090/0.016 ms
PING gsx.backbone.lan (192.168.10.14) 56(84) bytes of data.
64 bytes from gsx.backbone.lan (192.168.10.14): icmp_seq=1 ttl=64 time=0.106 ms
64 bytes from gsx.backbone.lan (192.168.10.14): icmp_seq=2 ttl=64 time=0.154 ms
64 bytes from gsx.backbone.lan (192.168.10.14): icmp_seq=3 ttl=64 time=0.114 ms

--- gsx.backbone.lan ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.106/0.124/0.154/0.024 ms
PING gsx.backbone.lan (192.168.10.15) 56(84) bytes of data.
64 bytes from gsx.backbone.lan (192.168.10.15): icmp_seq=1 ttl=64 time=0.072 ms
64 bytes from gsx.backbone.lan (192.168.10.15): icmp_seq=2 ttl=64 time=0.097 ms
64 bytes from gsx.backbone.lan (192.168.10.15): icmp_seq=3 ttl=64 time=0.081 ms

--- gsx.backbone.lan ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.072/0.083/0.097/0.012 ms

[3] mount -t nfs examples of incorrect behavior

for i in 3 4 5 6 ; do mount --verbose -t nfs -o ro gsx:/gpm604/80${i} /tmp/tony/80${i} ; done

mount.nfs: timeout set for Tue Mar  7 17:46:51 2017
mount.nfs: trying text-based options 'vers=4,addr=192.168.10.16,clientaddr=192.168.10.21'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'addr=192.168.10.16'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.10.16 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.10.16 prog 100005 vers 3 prot UDP port 597
gsx:/gpm604/803 on /tmp/tony/803 type nfs (ro)
mount.nfs: timeout set for Tue Mar  7 17:46:51 2017
mount.nfs: trying text-based options 'vers=4,addr=192.168.10.16,clientaddr=192.168.10.21'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'addr=192.168.10.16'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.10.16 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.10.16 prog 100005 vers 3 prot UDP port 597
gsx:/gpm604/804 on /tmp/tony/804 type nfs (ro)
mount.nfs: timeout set for Tue Mar  7 17:46:51 2017
mount.nfs: trying text-based options 'vers=4,addr=192.168.10.16,clientaddr=192.168.10.21'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'addr=192.168.10.16'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.10.16 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.10.16 prog 100005 vers 3 prot UDP port 597
gsx:/gpm604/805 on /tmp/tony/805 type nfs (ro)
mount.nfs: timeout set for Tue Mar  7 17:46:52 2017
mount.nfs: trying text-based options 'vers=4,addr=192.168.10.16,clientaddr=192.168.10.21'
mount.nfs: mount(2): Protocol not supported
mount.nfs: trying text-based options 'addr=192.168.10.16'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.10.16 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.10.16 prog 100005 vers 3 prot UDP port 597
gsx:/gpm604/806 on /tmp/tony/806 type nfs (ro)

[Addition] We discovered that when using 1 Gb/s ethernet connections/LAN's we do NOT have this problem. This problem only seems to occur when we attempt to mount things on our 10 Gb/s ethernet LAN's. We mounted the same volumes from the same NFS servers, on the same clients, with the same client OS and system binaries, utilizing the same DNS server, and the mounts rotated as expected (and as mentioned they should in many online how-to's). The problem now seems to be confined to utilizing 10 Gb/s ethernet for some reason.

AkosPrime
  • 13
  • 1
  • 4

1 Answers1

0

As far as I know, there is no requirement in any DNS protocol specification that says that multiple addresses returned for a look up should be tried in round robin order.

It's the job of the application to get the list and decide what to do with it.

Alnitak
  • 334,560
  • 70
  • 407
  • 495
  • The issue is with the NFS not the "DNS protocol specification". The DNS is correctly reporting the addresses in rotating order, as demonstrated by the 'ping' and 'nslookup' commands. However something is not working with the DNS system as it seems to fixate on a single address and only use it. I'm trying to figure out why. What causes it. And how to fix/change it so that it works correctly. – AkosPrime Mar 22 '17 at 16:28
  • @Akos Prime The DNS protocol doesn't even require that the responses are rotated. – Alnitak Mar 22 '17 at 18:07
  • No, but as we discovered today the problem seems to be only occurring on 10 Gb ethernet networks, since when we mount the same volumes, using the same NFS server, using the same DNS server, on the same clients with the same OS, the NFS mounts are rotated as expected. – AkosPrime Mar 24 '17 at 16:28
  • either way, this question is completely off topic here. You should close it and try on serverfault.com - you might get more useful answers there. – Alnitak Mar 24 '17 at 16:31
  • Also the point here is NFS. The DNS server IS rotating its repsonses, as demonstrated via 'nslookup' and 'ping'. So the DNS protocol is not the issue. – AkosPrime Mar 24 '17 at 16:32
  • How is an issue with NFS behavior not appropriate? – AkosPrime Mar 24 '17 at 16:33
  • 1
    because this site is for computer programming issues, not server / networking admin ones – Alnitak Mar 24 '17 at 18:09