0

I have a lab environment with 7 48-port switches (Ubiquiti ES-48-500W).

All of them are connected to 3 16-port aggregation switches (Ubiquiti ES-16-XG) via fiber.

All of the switches are brand new, and all of them are in the same room. Fiber modules are brand new, as are the fiber cables.

The only computers plugged into this network are:

  • one windows 2016 Server domain controller with DNS (but no internet access)
  • one linux VM running UNMS (Ubiquiti's free switch management software)
  • one windows 10 workstation running a third-party program called EMCO Ping Monitor

Before deploying these switches to my production environment, I conigured each switch and joined them all to UNMS.

I have the latest firmware installed on each switch.

Basically, just sitting there doing nothing, UNMS reports outages all throughout the day. Sometimes the outage is for a few seconds. Sometimes the outage is more than 20 seconds. And the outage is not between specific switches. Switch 104 will go down for a few seconds. Then switch 114. Then switch 100. It's all over the place.

Each 48-port switch is connected to 2 ES-16-XGs for redundancy. Using spanning tree protocol.

Should I be concerned about this? Are intermittent drops "normal" for switches? Or does this indicate a problem?

Thinking the UNMS /may/ not be 100% accurate (since it is still in beta), I installed a third-party program called EMCO Ping Monitor on a Window 10 workstation, which I then plugged into a random port on one of the 48-port switches and configured it to monitor every switch.

After letting it run and collect data for a few days, I found the same results. Intermittent drops all throughout the day, every day, with no apparent pattern.

My concern is that if I deploy these switches to my production environment, is this going to cause network issues? I expect these switches to work 100% since they are brand new.

Here is a screen shot of a sample report from one of the switches. (Ignore where it says it was down for 1 day - I had the computer turned off. But in the report you can clearly see there are 1,000 "outages" one just one switch, between July 26 and August 6.

network ping report

E C
  • 99
  • 3
  • 9
  • This is not normal. You should be concerned. Two guesses: hardware failure or STP misconfiguration. Look for interface errors or STP events on the switches. Try running a smaller setup to reduce scope. – Mark Wagner Aug 06 '18 at 23:50
  • Topology change events appear scattered all throughout the log. What's causing these switches to change their topology? For the 48-port switches, here are my settings: - MSTP (802.1s) - Priority: 32768 For the 16-port aggregation switches, here are my settings: - MSTP (802.1s) - Priority: 16384 (I chose MSTP because each switch will have 8 VLANs. Beyond what I have mentioned above I am really not sure what else I should be doing to configure STP. Maybe my settings are what's causing the network drops. I don't know enough about STP) – E C Aug 24 '18 at 22:39

1 Answers1

0

I was able to resolve the issue by switching to RSTP. After updating all of the switches there were 0 drops. MSTP is not something you can just 'turn on' as it requires further configuration besides switch priority.

E C
  • 99
  • 3
  • 9