1

I'm working on setting up a pair of CentOS 6.3 servers that will run a couple of KVM vms and have come across a problem setting up a bridge on a bond.

I am using Mode 4 (802.3ad) bonding on a pair of stacked Dell Powerconnect 5524 switches connecting to R320 servers. There are 2 links (1 to each switch) that form a Link Aggregation Group (802.3ad / LACP bonding). On top of the bond I have VLAN Tagging.

I've verified this is a problem on multiple other bonding modes so it isn't just a mode 4 issue.

I am testing what happens when 1 link is dropped (ie switch dies, cable breaks, etc).

If I don't have a bridge (for KVM), everything works fine, failover happens as expected.

If I have the bridge enabled, it works fine until failover (unplugging a cable). When failover happens /var/log/messages shows the slave link going down, followed within a second by:

kernel: br1: port 1(bond0.8) entering disabled state

The thing is /proc/net/bonding/bond0 shows the link is up as expected (simply with only 1 slave instead of 2). If I plug the cable back in it recovers and brings the bridge back to an enabled state.

I actually have tested this while a ping is occuring and if the timing is right a packet will actually leave the system after the link is lost, but before the disabled message occurs.

This disabled state I assumed was STP, but I have disabled STP on the bridge configuration and this issue still occurs.

brctl showstp br1 

still shows the link as disabled when it is running without a slave.

I also switched between the nics in the server (I have 2x Broadcom & 4x intel). It doesn't matter which configuration I have.

Does anyone know of a way to force the bridge to stay enabled or why its detecting the bond as disabled, when it isn't?

mgorven
  • 30,615
  • 7
  • 79
  • 122
jlawer
  • 11
  • 1
  • 2

1 Answers1

2

I've run into exactly the same issue with Fedora 16 on top of 2 x Dell R410s and a stucked pair of PowerConnect 6448s.

Bridged interface on top of a 802.3ad bond.

I'm experiencing exactly the same symptoms.

Here are the config files:

cat /etc/modprobe.d/bonding.conf

alias netdev-bond0 bonding

alias netdev-bond1 bonding

alias netdev-bond2 bonding

cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation

Transmit Hash Policy: layer3+4 (1)

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

802.3ad info

LACP rate: fast

Min links: 0

Aggregator selection policy (ad_select): stable

Active Aggregator Info:

Aggregator ID: 23

Number of ports: 2

Actor Key: 17

Partner Key: 629

Partner Mac Address: 00:21:9b:b2:08:40

Slave Interface: em1

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: 00:1e:c9:fd:f1:5e

Aggregator ID: 23

Slave queue ID: 0

Slave Interface: em2

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: 00:1e:c9:fd:f1:60

Aggregator ID: 23

Slave queue ID: 0

cat /etc/sysconfig/network-scripts/ifcfg-br0

DEVICE=br0

ONBOOT=yes

TYPE=Bridge

BOOTPROTO=none

IPADDR=10.100.100.101

NETMASK=255.255.255.0

IPV6INIT=no

IPV6_AUTOCONF=no

DHCPV6=no

IPV6ADDR=fe80::21e:c9ff:fefd:f15e/64

/etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0

USERCTL=no

BOOTPROTO=none

ONBOOT=yes

BONDING_OPTS="miimon=100 mode=4 lacp_rate=1 xmit_hash_policy=1"

BRIDGE=br0

cat /etc/sysconfig/network-scripts/ifcfg-em1

DEVICE=em1

HWADDR=00:1E:C9:FD:F1:5E

ONBOOT=yes

MASTER=bond0

SLAVE=yes