0

I've got for data interfaces on my server: eno5, eno6, ens3f0 and ens3f1. I need to create port bonding with these four interfaces with a VLAN ID 101 and with bond name data0.

More info:

  • RHEL 7.6
  • Interface names checked. (I've plugged of the cable and one interface goes down)
  • I'll most probably use mode 4 for this setup. (Not a network guy..)
  • UUID's were already in the config file. I've changed nothing.

EDIT: Further info:

  • There is no virtualization. We're talking about physical machines here.
  • Switch configuration is all set.
  • This is a fresh, minimal install. Are there any necessary packages, kernel modules or system configurations?

First try: Did exactly everything at this RHEL document: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sec-configuring_a_vlan_over_a_bond

except this document has two interfaces but I've got four. I've changed interface and bond names with my own values. And of course; IP, gateway and subnet was also my own.

Result: systemctl restart network was OK. But interface can't even ping it's own gateway...

About this try:

  • Mode opts was exactly the same on the document

Second try: Did exactly everything on this document: http://villasyslog.net/rhel-bonding-and-vlan-tagging/

Yet again, I've four interfaces so changed the values.

Result: systemctl restart network failed. My bond didn't get the IPv4 address. Instead, it showed my some silly IPv6 stuff.

About this try:

  • There was no /etc/modprobe.d/bonding.conf file.

At first try, I've had my files under /etc/sysconfig/network-scripts/ifcfg-* for my four interfaces and a bond config file (Total 5).

Second document suggested more files. Four interfaces, one for bond and an extra for VLAN tagging. Unfortunately, I don't have the first config files but I've got the second ones. Here they are:

ifcfg-data0

DEVICE=data0
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
BONDING_MODULE_OPTS="mode=4 miimon=100"
BONDING_SLAVE0=ens3f1
BONDING_SLAVE1=ens3f0
BONDING_SLAVE2=eno6
BONDING_SLAVE3=eno5
VLAN=yes
IPV6INIT=no

ifcfg-data0.101

DEVICE=data0.101
BOOTPROTO=none
ONBOOT=yes
IPADDR=host IP
NETMASK=netmask
GATEWAY=gateway
NETWORK=first IP of network
BROADCAST=broadcast IP
USERCTL=no
BONDING_MODULE_OPTS="mode=4 miimon=100"
BONDING_SLAVE0="ens3f1"
BONDING_SLAVE1="ens3f0"
BONDING_SLAVE2="eno6"
BONDING_SLAVE3="eno5"
VLAN=yes
IPV6INIT=no

The other four interfaces:

TYPE=Ethernet
BOOTPROTO=none
UUID=device uuid
DEVICE=eno5
ONBOOT=yes
MASTER=data0
SLAVE=yes
NM_CONTROLLED=no

TYPE=Ethernet
BOOTPROTO=none
UUID=device uuid
DEVICE=eno6
ONBOOT=yes
MASTER=data0
SLAVE=yes
NM_CONTROLLED=no

TYPE=Ethernet
BOOTPROTO=none
UUID=device uuid
DEVICE=ens3f0
ONBOOT=yes
MASTER=data0
SLAVE=yes
NM_CONTROLLED=no

TYPE=Ethernet
BOOTPROTO=none
UUID=device uuid
DEVICE=ens3f1
ONBOOT=yes
MASTER=data0
SLAVE=yes
NM_CONTROLLED=no

/proc/net/bonding requests:

I see two files under /proc/net/bonding. One of them is bond0 and I've no idea what that is:

Bond0

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Data

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: ens3f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: (MAC is here)
Slave queue ID: 0

Slave Interface: ens3f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: (MAC is here)
Slave queue ID: 0

Slave Interface: eno6
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: (MAC is here)
Slave queue ID: 0

Slave Interface: eno5
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: (MAC is here)
Slave queue ID: 0

systemctl restart network:

Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.

systemctl status network:

● network.service - LSB: Bring up/down networking
   Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2020-02-28 13:48:49 +03; 32s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 37887 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=1/FAILURE)

Feb 28 13:48:49 (host name here) network[37887]: RTNETLINK answers: File exists
Feb 28 13:48:49 (host name here) network[37887]: RTNETLINK answers: File exists
Feb 28 13:48:49 (host name here) network[37887]: RTNETLINK answers: File exists
Feb 28 13:48:49 (host name here) network[37887]: RTNETLINK answers: File exists
Feb 28 13:48:49 (host name here) network[37887]: RTNETLINK answers: File exists
Feb 28 13:48:49 (host name here) network[37887]: RTNETLINK answers: File exists
Feb 28 13:48:49 (host name here) systemd[1]: network.service: control process exited, code=exited status=1
Feb 28 13:48:49 (host name here) systemd[1]: Failed to start LSB: Bring up/down networking.
Feb 28 13:48:49 (host name here) systemd[1]: Unit network.service entered failed state.
Feb 28 13:48:49 (host name here) systemd[1]: network.service failed.

At first, NetworkManager was running. I've disabled it but still, systemctl restart network fails. Output of systemctl status NetworkManager:

● NetworkManager.service - Network Manager
   Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled; vendor preset: enabled)
   Active: inactive (dead) since Fri 2020-02-28 13:46:58 +03; 2min 13s ago
     Docs: man:NetworkManager(8)
 Main PID: 35612 (code=exited, status=0/SUCCESS)

Feb 27 16:26:51 (host name here) NetworkManager[35612]: <info>  [1582810011.3824] agent-manager: req[0x56187f15d3c0, :1.936/nmcli-connect/0]: agent registered
Feb 27 16:26:51 (host name here) NetworkManager[35612]: <info>  [1582810011.3830] audit: op="connection-activate" uuid="09bce14a-449a-3065-8d1b-d4bcde243bd8" name="Vlan data0.744" result="fail" reason="Failed to find a compatible device for this connection"
Feb 28 13:46:58 (host name here) systemd[1]: Stopping Network Manager...
Feb 28 13:46:58 (host name here) NetworkManager[35612]: <info>  [1582886818.8800] caught SIGTERM, shutting down normally.
Feb 28 13:46:58 (host name here) NetworkManager[35612]: <info>  [1582886818.8846] device (ens3f0): released from master device data.744
Feb 28 13:46:58 (host name here) NetworkManager[35612]: <info>  [1582886818.8851] device (ens3f1): released from master device data.744
Feb 28 13:46:58 (host name here) NetworkManager[35612]: <info>  [1582886818.8856] device (eno5): released from master device data.744
Feb 28 13:46:58 (host name here) NetworkManager[35612]: <info>  [1582886818.8860] device (eno6): released from master device data.744
Feb 28 13:46:58 (host name here) NetworkManager[35612]: <info>  [1582886818.8890] exiting (success)
Feb 28 13:46:58 (host name here) systemd[1]: Stopped Network Manager.
Ali1928
  • 1
  • 5
  • The VLAN config file shouldn't have the BONDING_ parameters. Did you verify that stuff works without the bonding? (I.e. test each NIC individually with the VLAN configuration) – Mat Feb 27 '20 at 16:00
  • 1
    You say you're not a network guy.. in case you overlooked this: mode 4 (aka 802.3ad, LACP) needs the switch on the other end to be configured appropriately as well. –  Feb 27 '20 at 20:26
  • @Mat you mean the ifcfg-data.101 file right? – Ali1928 Feb 28 '20 at 06:09
  • @yoonix thanks for the heads up. the network team told me that switches already configured. – Ali1928 Feb 28 '20 at 06:09
  • I do hope ifcfg-data.101 is a typo for the file name and you really named it ifcfg-data0.101 It's also very helpful to post the contents of /proc/net/bonding/ to see the results of the configuration. – Brandon Xavier Feb 28 '20 at 08:32
  • Hi @BrandonXavier I've checked to be sure and yes it was a typo on question. I've modified it. Thanks for the heads up. – Ali1928 Feb 28 '20 at 10:47
  • I've tried @Mat s way and deleted bonding params from data0.101, restart failed. I've added the same line again and deleted the one from data0, again, restart failed. All configuration files are as in the question right now. – Ali1928 Feb 28 '20 at 11:04
  • `RTNETLINK answers: File exists` usually occurs if you're trying to bring up an interface that's already up - possibly as a result of troubleshooting efforts. If it's feasible, you might try rebooting (yes, I hate that advice but sometimes it is simpler). Otherwise, you could try manually shutting down each interface, including data0 (ifconfig down), removing the bonding module (modprobe -r bonding) and then restarting the network. (BTW, I agree with Mat that the BONDING_ options shouldn't be in the ifcfg-data0.101 file) – Brandon Xavier Feb 28 '20 at 12:53
  • @BrandonXavier allright I'll remove that line from data0.101 file. By rebooting you've meant the server itself, right? That's not a problem for me, I'm not on production yet. If I go with rebooting, will I have to remove bonding module anyway? – Ali1928 Feb 28 '20 at 14:36
  • Correct, reboot the server if possible. That should also clear the bonding module so it can be loaded based on your config files which basically look correct. Sometimes after some failed configurations the state of the interfaces and modules get into an inconsistent state and it's easier to reboot than spend hours trying to figure out why a module, such as 'bonding', won't unload. – Brandon Xavier Feb 28 '20 at 19:31
  • the documentation for RHEL 7 is https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/ch-configure_network_bonding – natxo asenjo Feb 29 '20 at 20:01
  • @BrandonXavier I'll try as soon as I go to datacenter. Thanks again. – Ali1928 Mar 02 '20 at 06:25
  • @natxoasenjo Thanks for that URL. Somehow I've already found it but I need to be sure, does this guide covers VLAN's? – Ali1928 Mar 02 '20 at 06:26
  • https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/networking_guide/index#ch-Configure_802_1Q_VLAN_Tagging – natxo asenjo Mar 02 '20 at 16:18
  • Thanks @natxoasenjo Network Manager is a big no for us. Therefore I'm not able to use it's tools. The trick was "VLAN=yes" for tag file and remove it from the bond itself. I've added some extra lines for my VLAN and boom, tetris for Ali. Thanks for all of your comments. In a few days, I'll write my configuration files and try to explain the situation in an answer. – Ali1928 Mar 03 '20 at 08:52
  • yes, that's in the link I posted when you asked about vlans ;-). You have example for ip, nmcli and ifcfg files. – natxo asenjo Mar 03 '20 at 11:43
  • @natxoasenjo yes you're right but in the future, people may want to look for "VLAN_ID=" tag and also "Type=Vlan" for bond.xxx files :) – Ali1928 Mar 03 '20 at 12:54
  • ah, yes, the lost art of reading the manuals – natxo asenjo Mar 03 '20 at 13:01
  • @natxoasenjo and the lost art of "don't touch my switch config!!!" cuz after I did everything I could, a switch problem occured...I guess our customer likes to walk around in config files. – Ali1928 Mar 03 '20 at 13:26

1 Answers1

0

First of all, thanks to all comments to my question.

As far as I see, port bonding is kinda like regex. Everyone writes something in a guide and somehow it works for them. Well, not for me.

What I need to achieve was not clear until recent days. But now, I got it working. Only failover test remained.

What's the point all of these?

  1. I've got four 10 Gbit ethernet ports.
  2. I want these four work together and handle 40 Gbit traffic (in theory).
  3. Therefore, I need port bonding with lacp mode.
  4. Given ports must be configured on switch side for port bonding with lacp with given port channels. This is all network job. Not mine.

So, how to achieve this? First, I need to make sure that NetworkManager is not running and disabled:

systemctl stop NetworkManager
systemctl disable NetworkManager

Then, check if the interfaces up. For this, you need to make sure that network service is running:

systemctl status network #check if working
systemctl start network #start if not working

List all the interfaces:

ip a

If any interface has IP on it, make sure there are no conflicts with your bonding.

For putting your own configuration, stop the network service:

systemctl stop network

Your config files should be under /etc/sysconfig/network-service/ directory.

Slave example:

ifcfg-eno5

DEVICE=eno5
NAME=bond0-slave3
TYPE=Ethernet
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no

Let's go line by line.

  1. This is the interface name as seen by RHEL itself.
  2. Name s optional. It's for TUI or GUI softwares. This one is the number 3 slave of bond named bond0. Therefore, the name is bond0-slave3 (Remember, I had four slaves. This is the last one. Others go like bond0-slave2(1, 0).

The rest is, this is classic ethernet interface. MASTER will be bond0 and yes, this is a slave. Network Manager shouldn't control this interface.

Bond example:

ifcfg-bond0

DEVICE=bond0
NAME=bond0
TYPE=Bond
BONDING_MASTER=yes
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS="mode=4 miimon=100 lacp_rate=slow"
NM_CONTROLLED=no
  1. DEVICE is the name of the bond. I've given bond0.
  2. NAME is the same as above.
  3. TYPE is not ethernet. This is a Bond with a capital B.
  4. This virtual device is the bonding master.
  5. It starts on boot.
  6. No idea what that is.
  7. Bonding options will be different on different kind of bonding. This is lacp bonding with slow rate, also known as bonding mode 4.
  8. Network Manager shouldn't control this virtual interface.

Actually, port bonding is done here (except we need to define IP, subnet and gateway). As I heard, until this time I've managed to configure as "access mode". That means, only one VLAN can work on this switch. But this is not what we want and switch doesn't expect this kind of configuration. So I need to define a VLAN ID and tell that switch about it. This type of configuration is called "trunk mode".

Let's say my VLAN ID is 111.

VLAN Tagging Example:

ifcfg-bond0.111

DEVICE=bond0.111
TYPE=Vlan
NAME=vlan-bond0.111
BOOTPROTO=none
ONPARENT=yes
IPADDR=IP adresi
NETMASK=subnet mask
GATEWAY=gateway
VLAN=yes
VLAN_ID=111
NM_CONTROLLED=no

I've given my virtual device a name. The name must be BOND_NAME.VLAN_ID therefore I've used bond0.111

Type is essential. This is neither ethernet not Bond. This is a Vlan. This interface has nothing to do with server boot. If parent is up (I mean bond0) this should come up too.

IP, netmask and gateway lines defines themselves.

VLAN=yes !!!! RHEL documentation says that and this is essential. But I don't know. TYPE is Vlan why should I set a yes flag on VLAN too? Whatever.

I didn't see VLAN_ID on any document so far. This is the ID of my VLAN.

After all the configuration files set,

systemctl start network

This should work fine. Worked for me.

Ali1928
  • 1
  • 5