1

I have a EMC VNXe 3100 and I had tested both iSCSI and NFS connections from it to my hosts, with NFS appearing to be the better of the two. So I set it up and in my testing with 1 VM everything appears to be working fine. Now I am trying to put it in production and after migrating 2 VMs to that unit, it has stopped working. Now when I try to migrate any other VM to the VNXe 3100 the datastore disconnects from the ESXi 5.5 host preforming the migration. It will reconnect when I cancel the migration (which hangs at 23%) and I let it time out. I do have the EMC VAAI NAS Plugin installed on the hosts, and the VNXe 3100 NFS datastores are reporting that hardware acceleration is working. I have also changed the NFS queue depth to 64 on each host in my troubleshooting.

Any ideas?

Litzner
  • 405
  • 1
  • 5
  • 17
  • Interesting. Did you modify any of the ESXi hosts' advanced settings for NFS prior to this? There are some tunables that should help. Also, please show the ESXi host networking settings. Something like a [screenshot like this](http://i.stack.imgur.com/gXYm2.png). – ewwhite Oct 28 '14 at 12:31
  • In my troubleshooting I have modified NFX.MaxQueueDepth to 64, NFS.ReceieveBufferSize to 512, and NFS.SendBufferSize to 512. – Litzner Oct 28 '14 at 12:33
  • It ending up being a networking issue, I am not sure why it was a issue at all, but changing the configuration fixed it. For some reason the traffic on 2 of the datastores was jumping from on of the redundant switches to the other over the trunk in between to get to the hosts. I removed the vlan that these connections are on from the trunk, and changed up the networking to still provide fail-over in this configuration, and now I am no longer experiencing this issue. – Litzner Nov 19 '14 at 15:56

1 Answers1

1

This answer might help someone in the future:

I've had a similar problem within my infrastructure with a SVMotion (storage-only vMotion) from local datastore to an NFS share on NetApp. The datastore mounted successfully on all ESXi Hosts, but any attempt to SVMotion failed.

My actual cause was a mistake in the MTU value within the switch config. Configured both NetApp's LIF and ESXi vmkernel adapters to transmit using MTU 9000 but I did not configure that value correctly on the switches.

You can quickly validate the root cause utilising vmkping:

# Check default MTU 1500 (or lower):
vmkping -I vmk<X> -s 1500 <YOUR_NFS_SERVER_IP_ADDRESS>
PING <IPADDR> (<IPADDR>): 1500 data bytes
1508 bytes from <IPADDR>: icmp_seq=0 ttl=64 time=0.356 ms
1508 bytes from <IPADDR>: icmp_seq=1 ttl=64 time=0.264 ms
1508 bytes from <IPADDR>: icmp_seq=2 ttl=64 time=0.246 ms

--- <IPADDR> ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.246/0.289/0.356 ms

And then check echo replies with Jumbo Frames:

vmkping -I vmk<X> -s 9000 <YOUR_NFS_SERVER_IP_ADDRESS>
PING <IPADDR> (<IPADDR>): 9000 data bytes

--- <IPADDR> ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

In such a case, the MTU must be increased on your Layer 2 switch(es).

Kosiek
  • 111
  • 1