I'm trying to create an HA firewall/router pair using keepalived (VRRP only), conntrackd, and OSFP under quagga, all installed as the standard Centos & packages. This is similar to the setup described in Solving an asymetric HA Firewall/router but I'm using OSPF instead of BGP and my subnets are all internal.
To avoid the issue of asymmetric routing and stateful firewalls, I'm trying a suggestion mentioned by the OP but then not referred to again, "Ahh, would it simply work to stop the BGP daemon when in the backup state and start it in the master state?" I have added some systemctl commands into the primary-backup.sh script called by keepalived during a state change:
#!/usr/bin/bash
SYSTEMCTL_BIN=/usr/bin/systemctl
OSPFD=/usr/sbin/ospfd
CONNTRACKD_BIN=/usr/sbin/conntrackd
CONNTRACKD_LOCK=/var/lock/conntrack.lock
CONNTRACKD_CONFIG=/etc/conntrackd/conntrackd.conf
case "$1" in
primary)
#
# start OSPF daemon
#
command="${SYSTEMCTL_BIN} start $OSPFD"
output=$(command)
rc=$?
if [ ${rc} -eq 0 ]
then
logger "DEBUG: keepalived successfully invoked '${command}', output <${output}>"
else
logger "ERROR: keepalived failed to invoke '${command}'; return code ${rc}, output <${output}>"
fi
...
;;
backup)
#
# stop OSPF daemon
#
command="${SYSTEMCTL_BIN} stop $OSPFD"
output=$(command)
rc=$?
if [ ${rc} -eq 0 ]
then
logger "DEBUG: keepalived successfully invoked '${command}', output <${output}>"
else
logger "ERROR: keepalived failed to invoke '${command}'; return code ${rc}, output <${output}>"
fi
...
When I force a state change on the master from primary to backup, for example, I see an entry like this in /var/log/messages on the original master:
Apr 29 13:18:12 xxxxxx-a logger: DEBUG: keepalived successfully invoked '/usr/bin/systemctl stop /usr/sbin/ospfd', output <>
and like this on the original backup:
Apr 29 13:18:12 xxxxxx-b logger: DEBUG: keepalived successfully invoked '/usr/bin/systemctl start /usr/sbin/ospfd', output <>
Everything else works as expected (VIP addresses move from the original master to the original backup and conntrack entries are synchronized). However, despite the DEBUG log messages, ospfd is still running on the original master and is still not running on the original backup.
What am I doing wrong? Where should I look for more detailed information about what's actually happening? All suggestions are welcome.