1

I have installed CentOS 7 x64 on VMware workstation 12.0, I built wordpress website on it and then after my colleague finished his decorative work, I used VMware vCenter Converter Standalone Client 6.0 to move it to an ESXi 5.5 host. Since I moved it, the server is most of the time losing network connectivity. When I ping it, I get request timed out. I have to keep turning the NIC off/on to retain connectivity. When I open the website, I get a white empty page. I am unable to SSH the server as well.

I have many other servers that I installed on ESXi 5.5 since the very first moment, contrary to this server which I built on workstation 12.0 then moved it to ESXi 5.5. The VM machine version is 8.0, if I remember it rightly.

I removed the network card from the machine, deleted NIC file from network-scripts directory, rebooted the machine, added a new NIC, still the same issue. How can I troubleshoot such issues? Any clues on what's going on with this server.

MadHatter
  • 79,770
  • 20
  • 184
  • 232
elekgeek
  • 51
  • 5
  • 14

2 Answers2

1

If the NIC works at all, I would rule out driver, or at least put it lower on the list.

Next, you could be looking at hardware problems, but in a virtual environment, one bad NIC should affect all virtual machines connected to the NIC.

With intermittent connectivity, the best place to start is verify that you don't have a duplicate IP address on the network already.

One method is to shut down your cloned box and see if you can probe its IP on the network using a tool such as nmap.

nmap <ip_address>

If nothing is returned, nmap has some options you can use like -Pn, -sN and -sS that might allow you to see what a local firewall might otherwise be preventing you from seeing.

If you are on a corporate network that has a DHCP server, managed network switches or a hardware appliance firewall, you can look at each of those as well to see if the IP address you want to use is being used by two different devices - typically by showing two different mac addresses on a single IP.

Jim
  • 31
  • 2
  • It is version 8 on ESXi – elekgeek Oct 28 '15 at 21:57
  • There is no IP addresses conflict. I am sorry, by intermittent I meant it works for a few minutes, then stops working. I have to turn NIC off/on to retain network connectivity. During the time the problem occurs, I can ping 127.0.0.1, but not the gateway of the vlan. Strange thing I have a server already working on the same vlan..! – elekgeek Oct 28 '15 at 22:09
  • BTW, VMWare tools are not installed. I 've never had to install them on any server any way. – elekgeek Oct 29 '15 at 07:44
  • As a test, plug two computers together into a different switch, but disconnect that switch from your network so that these two computers can only talk to each other. If your computer stays online during this test, the problem is not your CentOS 7 install. – Jim Oct 29 '15 at 20:09
  • Actually, I don't really have to do this. My install is just OK, this is not my first server of this type, the settings are simply similar to other already running servers on the same host on the same vSwitch, others are running just fine. When I ping the loopback interface i.e. 127.0.0.1 I get a constant reply which means my TCP/IP stack is just OK. As long as the interface works and few minutes later disconnects and I have to turn it off/on then it appears to be a problem in ESXi itself. – elekgeek Oct 30 '15 at 09:47
  • The question becomes how can I troubleshoot such ESXi issues. I did create a new VM yesterday and built a new wordpress, it is the same thing for all new VMs. Now I have two new VMs that are losing connections, and 3 VMs of the same OS and configuration running OK. It is just insane.. These VMs are all running on the same VLAN. – elekgeek Oct 30 '15 at 09:47
  • Finally, I got something. When I ping other severs from the ill server on the same vlan, I get replies, when I ping the vlan's gateway from the ill server, I got nothing, no timeout, no destination unreachable, those servers with no problems can ping the vlan gateway. What does this mean? – elekgeek Oct 30 '15 at 10:13
  • I think you are on the right track by trying to isolate when it works and when it doesn't. Based on your new information, this could be a setting inside of the ESXi for each of the non-working clients. The areas I would look at include the network adapter assigned to the non-working clients compared to the working clients. I am specifically interested to know if you are using multiple NICs that are not teamed on the same host connected to the same VLAN. The ping could come from one NIC and be received by the other. – Jim Oct 30 '15 at 15:05
  • The NICs are all the same, E1000, I tried VMXNET3, no success. The host has two NICs teamed, the other two NICs are down. Today I discovered that one of the blade chassis switches had issues and I removed it. So I no longer have teamed NICs on this host. I thought the problem came out of this switch, but no, I installed another new VM, but still no success. – elekgeek Oct 30 '15 at 21:23
1

Here is the thing:

I managed to solve this issue by installing Update 2 for ESXI v5.5, all is smooth now, the update has updated drivers as well. I installed a new machine then left it for the night long to see if it's network gets disconnected.

Unfortunately, I could not get the machine that was converted to work without the network problems, tried installing vmware tools, open-vm-tools, nothing. I had to install a new machine to get things working. I am lucky it was a single unimportant machine!

elekgeek
  • 51
  • 5
  • 14