0

I have manually configured an email server - very simple config and I have done no fancy networking. I simply opened up only the relevant ports via iptables.

Once every 10-15 days however, eth0 simply disappears and all networking stops. The default logs give me no indications of what happens. When it happens, the only thing I do to fix it (and the only way I can seem to fix it) is by re booting the server via KVM access.

Question is, what monitoring/logging tools can I install to see what goes wrong? I would like to do my best before shouting hardware problem to the hosting company.

I am running a CentOS 6 server.

2 Answers2

0

If it isn't in /var/log/messages or dmesg from the kernel then I'm not sure which utility will get you the info to figure it out. I would try setting logging to debug in /etc/rsyslog.conf, look for the line with /var/log/messages and change info to debug.

In my experience, eth0 disappearing in CentOS is related to NetworkManager. Make sure you disable networkmanager then enable network. I've seen this happen when I have manually configured networking by creating/modifying /etc/sysconfig/network-scripts/ifcfg-eth0 but forgot to chkconfig networkmanager off then chkconfig network on.

Best thing is probably to troubleshoot live via KVM the next time it happens. One other thing I might try is a loop that runs something like ethtool to get the link status every 30 - 60 seconds just to get a time frame of when the failure happens. Reading logs is much easier when you have a timestamp or really small timeframe to focus on. :)

Last, if something is really happening to /etc/sysconfig/network-scripts/ifcfg-eth0 then you could create and audit policy to watch that file for any changes.

Ryan Davies
  • 126
  • 1
  • Try `mii-tool --watch eth0`, maybe add `--log` to syslog instead and run it in the background. Check if the NIC driver supports a `debug` flag that you can load it with: confirm the driver with `basename $(readlink /sys/class/net/eth0/device/driver/module)`, then inspect `modinfo /lib/modules/$(uname -r)/kernel/d rivers/net/$DRIVER.ko` for the supported `parm`s. – mr.spuratic May 23 '14 at 09:08
  • Thanks for the answer and the comment! I will try the suggestions out and report back - they all seem very good suggestions though! – Michael Tremante May 23 '14 at 09:47
0

Just for future reference, I finally discovered what the problem was.

This was actually due to a CentOS kernel due to timesync tx control register not set as expected.

References: https://groups.google.com/forum/#!topic/springdale-users/bBqrE545sYo http://bugs.centos.org/view.php?id=6810

In the end this problem simply resolved itself after I upgraded to a new server.