0

I'm running Debian-Squeeze and my server (PowerEdge r410) is experiencing sporadic internet connectivity loss. My website and SSH become completely unavailable. As far as I can tell, everything is configured correctly - when I boot the server up, all IP addresses are correct in ifconfig and the routing table is properly populated. If the server is running, ifdown -a && ifup -a likewise puts all parameters in their right place. The site is available, and ssh works as expected.

The last time the outage occurred (today), I visited the machine and logged in. Pinging the gateway resulted in 100% packet loss. So I used route to make sure that the gateway was in the routing table. There was a 5-10 second delay between route delivering the first and second lines below:

    192.168.0.0    *             255.255.255.0   U  0  0  0 eth0
    default        192.168.0.1   0.0.0.0         UG 0  0  0 eth0

Once route returned this information, I pinged the gateway again and got 100% connectivity. I immediately checked my website and it was also back up and running. ping and route were the only commands used before my site was back online.

That was this morning, and now the server is back down again. I have cron configured to ifdown -a && ifup -a at 4am tomorrow morning, so we'll see if that works. In the meantime, does anyone have any ideas about what might be causing this problem?

BTW, there's no DHCP, everything is static.

/etc/network/interfaces:

auto lo
iface lo inet loopback

allow-hotplug eth0
iface eth0 inet static
 address 192.168.0.121
 netmask 255.255.255.0
 network 192.168.0.0
 broadcast 192.168.0.255
 gateway 192.168.0.1
 dns-nameservers 192.168.0.10
 dns-search mysite.com
Andrew Parker
  • 203
  • 3
  • 9
  • To clarify, the delay in the `route` command happened only that once. I could not get the delay to happen again once the site was back online. – Andrew Parker Mar 06 '12 at 15:11

2 Answers2

1

If route is being slow, try using route -n. This switches off DNS lookups (which are slow at the best of times and, during network issues, take ages to timeout).

As for your connection issues, check that the link is correctly brought up. I've seen issues with intermittent connectivity caused by the server and the router incorrectly negotiating duplex operation. If one is set to auto, and the other is manually set (to full or half duplex) then a race condition can occur leading to massive packet loss and finally the full on collapse of the link. See here for details.

Update: Here is a better link explaining duplex mismatch as the wiki page doesn't include the entirety of this:

https://learningnetwork.cisco.com/thread/4506

webtoe
  • 1,976
  • 11
  • 12
  • The article you linked seemed to indicate that `ping` and perhaps also `ssh` should work even with a duplex mismatch... but I get 100% packet loss on `ping` and 'Connection refused' with `ssh`. – Andrew Parker Mar 06 '12 at 15:47
  • 1
    ping would work temporarily/initially on a slow link. But as traffic increases, so do the collisions on the link. Collisions keep occurring and repeats of packets occur because of the collisions. This leads to _more_ packets and more collisions _ad infinitum_. Here's a better explanation https://learningnetwork.cisco.com/thread/4506 – webtoe Mar 06 '12 at 16:20
  • This is obviously just what I think could be happening. It may be something else. – webtoe Mar 06 '12 at 16:25
  • I met with the administrator of the network and we changed both machines from autonegotiate to a forced 100/full. There's some distance between the machines (and the cable is unshielded) and he was concerned that perhaps the 1000/full was being too finicky. It's working now, we'll have to see how long it lasts... – Andrew Parker Mar 06 '12 at 17:06
  • ... about two days. It's back down. See my [new question](http://serverfault.com/questions/367788/connectivity-restored-with-ping). – Andrew Parker Mar 08 '12 at 21:14
0

If you are experiencing slow response from the gateway, you have to consider your problem is caused by the router instead of your server.

Lucas Kauffman
  • 16,880
  • 9
  • 58
  • 93