1

I have a transparent firewall (running VyOS) that passes BGP traffic between the routers on each side. When the link on one side of the bridge goes down, I want to bring down the link on the other side so the router will clear its BGP information without waiting for the 2:30 minute timer to expire.

(More background information)

This is my script:

#!/bin/bash

## This script will bounce a br interface if a member interface goes down.
## This will cause router BGP timers to reset, making outages last only seconds instead of minutes.
##
## This script is called by netplug on Vyos:
## /etc/netplug/linkdown.d/my-brdown
##
## Version History
## 1.0 - Initial version
##

LOCKDIR=/var/run/my-bridge-ctl

# Since we only have one br, not going to implement this right now.
#IGNORE_BRIDGES=()

IFACE=$1

#Remove the lock directory
function cleanup {
    if rmdir $LOCKDIR; then
        logger -is -t "my-bridge-ctl" -p "kern.info" "Finished"
    else
        logger -is -t "my-bridge-ctl" -p "kern.error" "Failed to remove lock directory '$LOCKDIR'"
        exit 1
    fi
}

if mkdir $LOCKDIR; then
    #Ensure that if we "grabbed a lock", we release it
    #Works for SIGTERM and SIGINT(Ctrl-C)
    trap "cleanup" EXIT

    logger -is -t "my-bridge-ctl" -p "kern.info" "Acquired lock, running"

    # Processing starts here

    IFACE_DESC=$(<"/sys/class/net/${IFACE}/ifalias")
    IFACE_BR_DIR="/sys/class/net/${IFACE}/brport"

    if [ ! -d "$IFACE_BR_DIR" ]; then
        logger -is -t "my-bridge-ctl" -p "kern.warning" "Interface ${IFACE} (${IFACE_DESC-no desc}) went down. Not a member of a bridge. Skipping."
    else
        IFACE_BR_LINK=$(realpath "/sys/class/net/${IFACE}/master")
        IFACE_BR_NAME=$(basename $IFACE_BR_LINK)
        IFACE_BR_DESC=$(<"${IFACE_BR_LINK}/ifalias")
        logger -is -t "my-bridge-ctl" -p "kern.warning" "Interface ${IFACE} (${IFACE_DESC:-no desc}) went down. Member of bridge ${IFACE_BR_NAME} (${IFACE_BR_DESC:-no desc})."

        # TODO: Insert IGNORE_BRIDGE check here

        find "${IFACE_BR_LINK}/brif" -type l -print0 | while IFS= read -r -d $'\0' IFACE_BR_MEMBER_LINK; do
            IFACE_BR_MEMBER_NAME=$(basename $IFACE_BR_MEMBER_LINK)
            logger -is -t "my-bridge-ctl" -p "kern.info" "Handling ${IFACE_BR_NAME} member interface ${IFACE_BR_MEMBER_NAME} (${IFACE_BR_MEMBER_LINK})."

            # Actually do the bounce
            ip link set dev ${IFACE_BR_MEMBER_NAME} down && sleep 2 && ip link set dev ${IFACE_BR_MEMBER_NAME} up

            logger -is -t "my-bridge-ctl" -p "kern.info" "Interface ${IFACE_BR_MEMBER_NAME} bounced."
        done
    fi

    sleep 5
else
    logger -is -t "my-bridge-ctl" -p "kern.info" "Could not create lock directory '$LOCKDIR'"
    exit 1
fi

When I run my script manually, it works fine. When netplugd runs it, it causes netplugd to crash. I ran netplugd in the foreground to make sure I captured all the output:

root@firewall00:~# netplugd -F           
/etc/netplug/netplug bond0 probe -> pid 10277
/etc/netplug/netplug bond1 probe -> pid 10278
/etc/netplug/netplug bond2 probe -> pid 10279
/etc/netplug/netplug bond3 probe -> pid 10280
/etc/netplug/netplug bond4 probe -> pid 10281
/etc/netplug/netplug bond5 probe -> pid 10282
/etc/netplug/netplug bond6 probe -> pid 10283
/etc/netplug/netplug bond7 probe -> pid 10284
/etc/netplug/netplug bond8 probe -> pid 10285
/etc/netplug/netplug bond9 probe -> pid 10286
/etc/netplug/netplug bond10 probe -> pid 10287
/etc/netplug/netplug bond11 probe -> pid 10288
/etc/netplug/netplug bond12 probe -> pid 10289
/etc/netplug/netplug bond13 probe -> pid 10290
/etc/netplug/netplug bond14 probe -> pid 10291
/etc/netplug/netplug bond15 probe -> pid 10292
/etc/netplug/netplug br0 probe -> pid 10293
/etc/netplug/netplug br1 probe -> pid 10294
/etc/netplug/netplug br2 probe -> pid 10295
/etc/netplug/netplug br3 probe -> pid 10296
/etc/netplug/netplug br4 probe -> pid 10297
/etc/netplug/netplug br5 probe -> pid 10298
/etc/netplug/netplug br6 probe -> pid 10299
/etc/netplug/netplug br7 probe -> pid 10300
/etc/netplug/netplug br8 probe -> pid 10301
/etc/netplug/netplug br9 probe -> pid 10302
/etc/netplug/netplug br10 probe -> pid 10303
/etc/netplug/netplug br11 probe -> pid 10304
/etc/netplug/netplug br12 probe -> pid 10305
/etc/netplug/netplug br13 probe -> pid 10306
/etc/netplug/netplug br14 probe -> pid 10307
/etc/netplug/netplug br15 probe -> pid 10308
/etc/netplug/netplug eth0 probe -> pid 10309
/etc/netplug/netplug eth1 probe -> pid 10310
/etc/netplug/netplug eth2 probe -> pid 10311
/etc/netplug/netplug eth3 probe -> pid 10312
/etc/netplug/netplug eth4 probe -> pid 10313
/etc/netplug/netplug eth5 probe -> pid 10314
/etc/netplug/netplug eth6 probe -> pid 10315
/etc/netplug/netplug eth7 probe -> pid 10316
/etc/netplug/netplug eth8 probe -> pid 10317
/etc/netplug/netplug eth9 probe -> pid 10318
/etc/netplug/netplug eth10 probe -> pid 10319
/etc/netplug/netplug eth11 probe -> pid 10320
/etc/netplug/netplug eth12 probe -> pid 10321
/etc/netplug/netplug eth13 probe -> pid 10322
/etc/netplug/netplug eth14 probe -> pid 10323
/etc/netplug/netplug eth15 probe -> pid 10324
/etc/netplug/netplug eth3 in -> pid 10325
/etc/netplug/netplug eth0 in -> pid 10326
/etc/netplug/netplug eth1 in -> pid 10327
/etc/netplug/netplug br0 in -> pid 10328
br0: state INNING pid 10328 exited status 0
eth3: state INNING pid 10325 exited status 0
eth0: state INNING pid 10326 exited status 0
eth1: state INNING pid 10327 exited status 0
eth2: state DOWN flags 0x00001003 UP,BROADCAST,MULTICAST -> 0x00011043 UP,BROADCAST,RUNNING,MULTICAST,10000
/etc/netplug/netplug eth2 in -> pid 10337
eth2: state INNING pid 10337 exited status 0
eth2: state ACTIVE flags 0x00011043 UP,BROADCAST,RUNNING,MULTICAST,10000 -> 0x00001003 UP,BROADCAST,MULTICAST
/etc/netplug/netplug eth2 out -> pid 10340
my-bridge-ctl[10344]: Acquired lock, running
my-bridge-ctl[10349]: Interface eth2 (br0 inside - net1138a) went down. Member of bridge br0 (no desc).
my-bridge-ctl[10353]: Handling br0 member interface eth2 (/sys/devices/virtual/net/br0/brif/eth2).
eth2: state OUTING flags 0x00001003 UP,BROADCAST,MULTICAST -> 0x00001002 BROADCAST,MULTICAST
eth2: state DOWNANDOUT flags 0x00001002 BROADCAST,MULTICAST -> 0x00001003 UP,BROADCAST,MULTICAST
Error: eth2: unexpected state DOWNANDOUT for UP
root@firewall00:~# my-bridge-ctl[10357]: Interface eth2 bounced.
my-bridge-ctl[10359]: Handling br0 member interface eth3 (/sys/devices/virtual/net/br0/brif/eth3).
my-bridge-ctl[10386]: Interface eth3 bounced.
my-bridge-ctl[10389]: Finished

The error is Error: eth2: unexpected state DOWNANDOUT for UP.
I can't figure out what is causing netlogd to get to that point.

yakatz
  • 2,142
  • 1
  • 18
  • 47

2 Answers2

1

This may be a bug in netplugd, reported in Debian in December 2011. The following patch was proposed in January 2013, accepted (in 1.2.9.2-2) in November 2014, and released in May 2015. (That is one very slow bugfix process.)

--- a/if_info.c
+++ b/if_info.c
@@ -186,6 +186,7 @@
         if (newflags & IFF_UP) {
             switch(info->state) {
             case ST_DOWN:
+            case ST_DOWNANDOUT:
                 info->state = ST_INACTIVE;
                 break;

The VyOS netplug repository does not have this patch.

I suggest you talk to the VyOS people about adding that patch.

Moshe Katz
  • 15,992
  • 7
  • 69
  • 116
  • That patch only hides the problem. I want to know why it is getting there and how to stop it. – yakatz Mar 19 '18 at 15:04
0

Moshe's answer made me look more carefully at the netplugd source. The state ST_DOWNANDOUT is used when netplugd is processing a link-down script and the link goes down again. The mentioned patch just hides that condition. I added per-interface locks to my script and it works fine now.

The final-ish code is here: https://gist.github.com/yakatz/f16824ea3a4597d35737463c612eb8a3#file-my-bridge-ctl-sh

yakatz
  • 2,142
  • 1
  • 18
  • 47