0

We have a set of devices connected in a ring, that are using stp (using https://github.com/mstpd/mstpd at the moment) to avoid issues with the loop. What we have seen, is that while stp can adapt the network when a link is fully broken, it ignores a situation where one of the links sees high packet loss.

Is this something stp supports? that is, to consider the packet loss in network links in the cost associated to a given interface and adapt the network based on that? If not, are there any good alternatives?

The way we ran into that, is because there were some usb to eth adapters that failed in some cases after rebooting or after a manual reconnection of the eth cable. While the issue is likely some driver/os related issue, it showed very well that stp still chooses the partially failing link, ending in a permanent 50% packet loss instead of the 0% of avoiding that link.

eglasius
  • 101
  • 4

1 Answers1

2

What we have seen, is that while stp can adapt the network when a link is fully broken, it ignores a situation where one of the links sees high packet loss. Is this something stp supports?

Link quality is nothing STP cares about nor is designed to handle.

STP runs between switches. If more than just a very few packets are lost the link's bad and needs to be fixed.

The only way to avoid those bad links would be some monitoring and shutting down the port at a certain error rate. Since links are normally practically error-free, that's nothing you'd commonly find in a switch.

As a workaround, you could use port priorities to have STP avoid a known-bad link while still keeping it as a fail-over.

Zac67
  • 10,320
  • 2
  • 12
  • 32
  • the confusion I have, is that STP adapts to a cable being disconnected by using the other links. So it kind of cares about keeping the network working in the event of part of a link becoming unavailable. If one has redundancy in a network to make sure to keep things up, then I wonder if a physical failure is always a perfect full disconnect (which we happened to be hit during early testing due to those adapters issues). – eglasius Dec 08 '20 at 08:33
  • @eglasius xSTP just uses a link-up/link-down type detection - no quality monitoring whatsoever. Yes, if your cabling is OK then a failure is most likely a clean disconnect (cable disconnection, transceiver failure, switch offline, ...). If you need quality monitoring you'd have to add it to your stack. However: for a decent network you should never use rings but connect your devices to a decent, central switch (or a hierarchical structure of switches). – Zac67 Dec 08 '20 at 09:35
  • in the network(s) in our context the devices in the network are all part of a bigger standardized system that need to talk to each other. 3 devices in this version, with potentially a couple more later on. A single switch is being viewed as a single point of failure, so its really 2 switches in diff physical locations + 2 * [device count] cables vs. [device count] cables for the ring. The main point against so far is its not adapting to lossy links. When you say never to use rings, are there other concerns in mind? – eglasius Dec 09 '20 at 08:03
  • Viewing a decent switch as SPOF but at the same time trying to use some kind of workaround for substandard cabling is - strange... With standard Ethernet, rings generally don't behave very well (unless you're using the appropriate Ethernet options). You might want to check the [relevant discussions](https://networkengineering.stackexchange.com/search?q=ring) on [Network Engineering](https://networkengineering.stackexchange.com). – Zac67 Dec 09 '20 at 10:32