1

Our office has been experiencing a series of rolling internet outages over the last couple of weeks which is driving everyone crazy. After isolating a few individual components, my current theory is that this might be due to a failing Dell PowerConnect 5448 managed switch - but my experience with these is limited; hence this post.

Topology:

  1. DSL router with Shaw Business 100 & static IP address ->
  2. Linksys wrt 1900ac for NAT and DHCP ->
  3. Dell PowerConnect 5448 managed switch ->
  4. A whole slew of computers, printers, lap tops, smartphones etc.. totalling approx 100 devices

Things I've Tried:

  1. Plugging an ISP uptime monitoring server directly into the DSL, bypassing the rest of the network; in order to determine if its the ISP that's failing; it doesn't seem to be; e.g. the monitoring server will retain a connection while the rest of the network cannot connect to the internet.

  2. Replacing the linksys 1900ac with an asus ac3200; as the linksys has had bad reviews; also replaced the linksys with under warranty with a new model

  3. The usual stuff - power cycling; re-booting etc.

Current Diagnosis:

The managed switch is at least 5 years old and seems to be exhibiting strange behaviour; most recently up for only 6 days (before that not sure), internal clock reading jan 2000; tried resetting it, but then the system crashed fatally 3 times today per the log messages below & then the clock reset to jan 2000 again:

1   2147480173  01~Jan~2000 02:04:24    Emergency   %OS-F-MEMORY: OSMEMG_rn_free: Memory address is NULL ***** FATAL ERROR *****  Reporting Task: GOAH. Software Version: 1.0.2.7 (date  17-Jun-2008 time  20:0 4:29) 0x1439f8 0x141074 0x4e1058 0x33078c 0x334114 0x4e4748 0x188620 0x1888e0 0x188948 0x189450 0x17a9dc 0x17abd4 0x17af90 0x17b620 0x17c04c 0x1 7c184 0x17edfc ***** END OF FATAL ERROR *****   
2   2147481910  01~Jan~2000 00:03:00    Emergency   %OS-F-MEMORY: OSMEMG_rn_free: Memory address is NULL ***** FATAL ERROR *****  Reporting Task: GOAH. Software Version: 1.0.2.7 (date  17-Jun-2008 time  20:0 4:29) 0x1439f8 0x141074 0x4e1058 0x33078c 0x334114 0x4e4748 0x188620 0x1888e0 0x188948 0x189450 0x17a9dc 0x17abd4 0x17af90 0x17b620 0x17c04c 0x1 7c184 0x17edfc ***** END OF FATAL ERROR *****   
3   2147483647  26~May~2015 12:53:57    Emergency   %OS-F-MEMORY: OSMEMG_rn_free: Memory address is NULL ***** FATAL ERROR *****  Reporting Task: GOAH. Software Version: 1.0.2.7 (date  17-Jun-2008 time  20:0 4:29) 0x1439f8 0x141074 0x4e1058 0x33078c 0x334114 0x4e4748 0x188620 0x1888e0 0x188948 0x189450 0x17a9dc 0x17abd4 0x17af90 0x17b620 0x17c04c 0x1 7c184 0x17edfc ***** END OF FATAL ERROR ***** 

Another log is showing a bunch of spanning tree related up and down times for the linksys (port g43) per the logs below; does this mean I should enable RSTP?

17  2147483549  01~Jan~2000 00:01:01    Warning %STP-W-PORTSTATUS: g43: STP status Forwarding   

Combined with numerous consecutive g43 link up/link down warnings/info messages.

Is this switch buggered? Should I replace it? Thanks!

Mark Henderson
  • 68,823
  • 31
  • 180
  • 259
Reece
  • 167
  • 6
  • The STP status warning is simply telling you that the port is in the STP Forwarding state. That doesn't strike me as being a problem. There are essentially two STP states that you really care about (of the 5 available port states); Blocking and Forwarding. A port in the Blocking state does not forward traffic (because doing so would create a switching loop). A port in the Forwarding state does forward traffic (because doing so would not create a switching loop). – joeqwerty May 26 '15 at 23:38
  • Are all of those log entries from the Ram Log? That fatal error looks... fatal. I agree with @EEAA in that you probably ought to replace this switch. – joeqwerty May 26 '15 at 23:39
  • the RAM log has more innocuous looking log entries like the port status entries as well as sequences of link up and link down warnings and info for the linksys router. The 'log' (e.g. not ram log) which is a more general log is showing the emergency severity warnings for the fatal crashes. – Reece May 27 '15 at 00:11

1 Answers1

2

Is this switch buggered?

Maybe.

Should I replace it?

Yup. :)

It's a fairly inexpensive piece of equipment, and with the instability you've seen, it seems like a no-brainer to replace it. Perhaps before replacing though, it would be worth re-flashing the latest firmware.

EEAA
  • 109,363
  • 18
  • 175
  • 245
  • thanks for the quick reply! Any specific reasons that lead you to this conclusion or just all of the above / eliminate another possibility type of thing? – Reece May 26 '15 at 22:46
  • See my edit for a few more details... – EEAA May 26 '15 at 22:47