0

I am doing experiments on my Proxmox server. The purpose of the expriment is to establish reliable and fault-torrant communications between two PCs that control industrial equipment. but I am puzzled by the results of the experiments.

Experiment 1

The nework layout is :

+-------------------------------------------+       
|           ens21   x                       |   SRV1
|                   |                       |   172.16.1.2
|           br0                             |
|                   |                       |
|bond0.10.  * - - - + - - - - - *   bond0.20|
|           |                   |           |
|   ens19   x...................x   ens20   |
+-------------------------------------------+
            |                   |
    vlan10  |                   |   vlan20
            |                   |
+-------------------------------------------+       
|   eth3.10 x                   x   eth4.20 |   SW1
|                                           |
|   eth1.10 x                   x   eth2.20 |
+-------------------------------------------+
            |                   |
            |                   |
            other               |
    vlan10  bridges             |   vlan20
            or                  |
            switches            |
            |                   |
+-------------------------------------------+       
|   eth3.10 x                   x|  eth4.20 |   SW2
|                                           |
|   eth1.10 x           eth2    x   eth3.20 |
+-------------------------------------------+               
            |                   |
    vlan10  |                   |   vlan20
            |                   |
+-------------------------------------------+   SRV2
|   ens19   ...................x    ens20   |   172.16.1.1
|           |                   |           |
|bond0.10.  * - - - + - - - - - *   bond0.20|
|                   |                       |
|           br0                             |
|                   |                       |
|           ens21   x                       |
+-------------------------------------------+

Note:
x: NIC
*: Bonding interface
....: Bonding connection
- or | seperated by space: Bridging connection
  1. SRV1 is a Debian VM. It has there interfaces, ens19 and ens20. ens21 is reserved for other VMs. I bond ens19 and ens20 to bond0. Bond0 is in broadcast mode. Br0 with an IP of 172.16.1.1 is a bridge over bond0.10, bond0.20 and ens21. SRV2 is similar to SRV1. IP of br0 is 172.16.1.2.

Here is the my configuration of SRV1:

auto bond0
iface bond0 inet manual
        up ifconfig $IFACE promisc
        up ifconfig bond0 0.0.0.0 up
        bond-slaves ens19 ens20
        #bond-miimon 100
        bond-downdelay 200
        bond-updelay 200
        #arp_interval 100
        #arp_ip_target 172.16.1.2
        #bond-mode active-backup
        bond-mode broadcast
        #bond-mode balance-alb
        #pre-up echo 100 > /sys/class/net/bond0/bonding/arp_interval
        #pre-up echo +172.16.1.2 > /sys/class/net/bond0/bonding/arp_ip_target

auto bond0.10
iface bond0.10 inet manual
#iface bond0.10 inet static
#       address 192.168.100.11
#       netmask 2558.255.255.0
#       vlan-raw_device bond0

auto bond0.20
iface bond0.20 inet manual
#iface bond0.20 inet static
#       address 192.168.200.12
#       netmask 255.255.255.0
#       vlan-raw_device bond0

auto ens21
iface ens21 inet manual
        up ifconfig $IFACE promisc

auto br0
iface br0 inet static
        #bridge_ports bond0 ens21
        bridge_ports bond0.10 bond0.20 ens21
        address 172.16.1.1
        broadcast 172.16.255.255
        netmask 16
        bridge_stp off
        bridge_fd 0
  1. SW1 is an Openwrt VM. SW1 hase four ports(eth1~4). I create two bridges: br-lan10 is over eth1.10 and eth3.30, br-lan20 is over eth2.20 and eth4.20. SW2 is similar to SW1.

/etc/config/network on SW1:

config interface 'eth1_10'
        option proto 'none'
        option ifname 'eth1.10'
        option auto '1'

config interface 'eth2_20'
        option proto 'none'
        option ifname 'eth2.20'
        option auto '1'

config interface 'eth3_10'
        option proto 'none'
        option ifname 'eth3.10'
        option auto '1'

config interface 'eth4_20'
        option proto 'none'
        option ifname 'eth4.20'
        option auto '1'

config interface 'lan10'
        option proto 'static'
        option type 'bridge'
        option ifname 'eth1.10 eth3.10'

config interface 'lan20'
        option type 'bridge'
        option proto 'none'
        option auto '1'
        option ifname 'eth2.20 eth4.20'

  1. Between SW1 and SW2, there maybe other VMs acting as switches.

When I ping from SRV1 to SRV2, I get delays about 40ms, and I get no duplicate packets:

root@SRV1:~# ping 172.16.1.2 -c 5
PING 172.16.1.2 (172.16.1.2) 56(84) bytes of data.
64 bytes from 172.16.1.2: icmp_seq=1 ttl=64 time=37.7 ms
64 bytes from 172.16.1.2: icmp_seq=2 ttl=64 time=44.0 ms
64 bytes from 172.16.1.2: icmp_seq=3 ttl=64 time=36.9 ms
64 bytes from 172.16.1.2: icmp_seq=4 ttl=64 time=46.1 ms
64 bytes from 172.16.1.2: icmp_seq=5 ttl=64 time=45.8 ms

--- 172.16.1.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 14ms
rtt min/avg/max/mdev = 36.864/42.085/46.071/3.986 ms

I also find that the CPU usage of PROXMOX and SRV1 is almost 98% and 86% respectively. The flow monitored increases rapidly from 4KB to about 120MB.

Experiment 2

I make the following changes:

  1. On SRV1 and SRV2, the br0 is over ens21 and bond0.

/etc/network/interfaces on SRV1:

auto br0
iface br0 inet static
        bridge_ports bond0 ens21
        #bridge_ports bond0.10 bond0.20 ens21
        address 172.16.1.1
        broadcast 172.16.255.255
        netmask 16
        bridge_stp off
        bridge_fd 0
  1. On SW1, br-lan10 is over eth1 and eth3.10, br-lan20 is over eth3 and eth4.20.

SW2 has similar configuration.

Here is the /etc/config/network on SW1:

config interface 'lan10'
        option proto 'static'
        option type 'bridge'
        option ifname 'eth1 eth3.10'

config interface 'lan20'
        option type 'bridge'
        option proto 'none'
        option auto '1'
        option ifname 'eth2 eth4.20

This time, the whole system work fine: I get tripple packets and low latancy:

root@SRV1:~# ping 172.16.1.2 -c 5
PING 172.16.1.2 (172.16.1.2) 56(84) bytes of data.
64 bytes from 172.16.1.2: icmp_seq=1 ttl=64 time=0.989 ms
64 bytes from 172.16.1.2: icmp_seq=1 ttl=64 time=1.00 ms (DUP!)
64 bytes from 172.16.1.2: icmp_seq=1 ttl=64 time=1.05 ms (DUP!)
64 bytes from 172.16.1.2: icmp_seq=1 ttl=64 time=1.06 ms (DUP!)
<Other outputs ommited here>
64 bytes from 172.16.1.2: icmp_seq=5 ttl=64 time=0.825 ms

--- 172.16.1.2 ping statistics ---
5 packets transmitted, 5 received, +12 duplicates, 0% packet loss, time 10ms
rtt min/avg/max/mdev = 0.811/1.022/1.310/0.143 ms

Quesstion

What I expected are before doing these two experiments are:

  1. Experiment 1 —— no broadcast storm will occur, because interfaces and connections from ens19 of SRV1 to ens19 of SRV2 are all in vlan10, while interfaces and connections from ens20 of SRV1 to ens20 of SRV2 are all in vlan20.

  2. Experiment 2 —— there will be broadcast storm, because there is a loop ( ens19@SRV1 -- ens19@SRV2 -- ens20@SRV2 -- ens20@SRV1 -- ens19@SRV1). But I get the opposite result.

Could anyone please tell me why in Experiment 1, the network has broadcast storm; while in Experiment 2, there is no?

Thanks a lot!

Jet
  • 9
  • 3

1 Answers1

0

may be I am a little bit lost in the description.... Let me try ;-).

SRV1/ens19 is access VLAN 10 on switch (untagged)
SRV1/ens20 is access VLAN 20 on switch (untagged)

You are creating bond interface over the connection / ports with different setting. This looks to me ... let say not usual. Next to it once you create this bond you are bridging the VLANs together... I am not sure what you are trying to do exactly. You are logically patching VLANs together and creating the loops.

I would think about trunk setting on the ports on switch and have tagged VLANs and the same config on the ports. Then create bond over these ports. Then you can play with bond0.10 and bond0.20.

Btw. in case you put bridge on both ends there is nice logical loop - are you sure you want this kind of setup? Are there some STP in place to eliminate this ?

Good luck.

Kamil J
  • 1,632
  • 1
  • 5
  • 10
  • Yes. What I am doing is very strange and unsual. My purpose is to let two PCs behind SRV1 and SRV2 can communicate with earch other at anytime. So I use two links working at the sametime. Once one link fails, the other link is still sending/receiving packets. Not communication interruption or lantency more than 50ms is allowed. All the PC/switches/Bridges can be accessed and configured easily in the LAN. I will try in the way you advised. No Spanning Tree Protocol are enabled on SW1 and SW2. – Jet Jan 09 '20 at 05:14
  • Hmm, but now you are "reinventing the wheel"... I think you will not do it better than using stuff designed for it. I would stop this design... Setup LACP group of 2+ physical connection between all the nodes (instead od VLAN 10 and VLAN20 two patches in logical group). You will need just one VLAN and it will do what you want with easier setup. On the sw level it is LACP group / etherchannel and on the linux system it is bond. There will be two connection both active and interrupting any of the pair will not cause the interruption (in the really bad case retransmit the packet). – Kamil J Jan 09 '20 at 07:19
  • ... and for the OS system level it will be bond and so just one single IP / setting. As simple as possible setup will help you to prevent typo and other unwilling stuff what may happen ;-). – Kamil J Jan 09 '20 at 07:22