1

We have BIG-IP version 13.1.0.2 deployed in Azure using the Auto Scale BIG-IP WAF (LTM + ASM) - VM Scale Set template and it has been working fine until recently, when one of the 5 instances started showing as (cfg-sync Disconnected)(Offline). If I check the logs of a healthy device I see entries like:

Sep 27 03:38:31 waf-vmss_0 crit tmm5[10398]: 01010201:2: Inet port exhaustion on 10.0.0.9 to 10.0.0.13:4353 (proto 6)
Sep 27 03:38:32 waf-vmss_0 crit tmm5[10398]: 01010201:2: Inet port exhaustion on 10.0.0.9 to 10.0.0.13:4353 (proto 6)
Sep 27 03:38:32 waf-vmss_0 crit tmm5[10398]: 01010201:2: Inet port exhaustion on 10.0.0.9 to 10.0.0.13:4353 (proto 6)

While following the ConfigSync guide and trying to run tmsh load sys config verify on the disconnected device I get

Validating configuration...
  /config/bigip_base.conf
  /config/bigip_user.conf
  /config/bigip.conf
  /config/bigip_script.conf
  /config/partitions/CloudLibsLocal/bigip.conf
There were warnings:
/Common/f5.service_discovery definition:71: warning: [use curly braces to avoid double substitution][[string first , $orderPath]]

01071747:3: ASM/DOS must be provisioned when a Virtual Server is using a DoS profile (/Common/misc.prod.dos) with Application Security enabled.
Unexpected Error: Validating configuration process failed.
username@(waf-vmss_2)(cfg-sync Disconnected)(Offline)(/Common)(tmos)#

I have already tried restarting the device, restarting the VMSS VM, revoking and re-assigning the license but those didn't have any effect. I even manually cleaned up the /config/ files enough to get the config to validate, and removed the device from all groups and the trust group. That caused it to come online in an active state as a standalone instance, but as soon as I try to add it back it goes back to being disconnected and offline.

The VMs are all part of the same vmss, using the same subnet, with full access in their NSG to the other devices. There isn't much traffic right now (just a few health checks) so I doubt there is SNAT port exhaustion due to volume of requests. I can also ping or curl IPADDRESS:8443 just fine.

Is there any way to reset the config and/or get the VM assigned to a different IP address?

Greg Bray
  • 5,610
  • 5
  • 36
  • 53

1 Answers1

0

We had a stress test scheduled and needed to get the WAF back to full capacity, so I decided to just delete the VM from the VMSS that was having issues. It took ~15 minutes, and after deleting it it was recreated with a different name. Azure shows waf-vmss_0 thru 4 but the waf-vmss_2 device is missing from device management and instead there is a new waf-vmss_5 instance.

Once the new instance was provisioned it was able to sync (using a different IP this time). Still no idea what the issue was, or if the vmss / instance name difference will cause any issues. Prior to deleting the VM I did remove it from the device groups and revoke the license, since we have had issues with those not being cleaned up when VMs were deleted.

Greg Bray
  • 5,610
  • 5
  • 36
  • 53