I've run out of ideas with this problem, so thought a SF question may help.
We've a number of Ubuntu 9.10 servers which we've recently switch from single NICs to bonded NICs using standard kernel network bonding.
This setup works as planned (and as has done in the past for various Linux machines), however we've had some boxes simply drop off the network hours after enabling bonding.
The boxes literally stop responding on the network, however a simple /etc/init.d/networking restart via the KVM brings the connection back online.
My first thought were along the lines that either 1) the upstream connection stopped, 2) something local on the box blew away the network configuration (e.g. network-manager), or 3) the bonding crashed somehow.
However I quickly hit a wall trying to investigate this across all four servers.
The event is not logged locally on any of the servers (/var/log/*, dmesg, etc). I expected to see a change in link status or similar.
The upstream switches all centrally syslog, which also recorded no change in network state, nor MAC flapping.
/proc/net/bonding/bond0 reported no issues
I can't see anything along the lines of network-manager running.
The only things logged are the change in network state cause by running the service restart.
Originally we used mode=0 (active-active), but with the suggestion that it was casuing network confusion with MACs being present in two places we switched to mode=1 (active-standby) -- this made no difference and the servers failed again a few hours later.
It's like the network just "stops". Any ideas folks?
Configuration
/etc/modprobe.d/bonding.conf
alias bond0 bonding
options bonding mode=0 miimon=100
/etc/network/interfaces
auto bond0
iface bond0 inet static
address 192.168.1.10
gateway 192.168.1.1
netmask 255.255.255.0
slaves eth0 eth1
up /sbin/ifenslave bond0 eth0 eth1
down /sbin/ifenslave -d bond0 eth0 eth1
auto eth0
iface eth0 inet manual
auto eth1
iface eth1 inet manual