I'm implementing a monitoring system for an existing modest sized data center deployment.
So far I've only gotten to the host / application side of the monitoring equation but I'm noticing what I consider to be an alarming number of Ethernet errors on various hosts. To me, alarming is 3 or 4 per day per host (some have none). When I look at the SNMP counters for the switches, I again see lots of errors on the counters but I'm not graphing those errors (yet).
In my prior environments with many more ports my error rate was approximately zero except for those hosts that had actual problems like duplex mismatches.
None of these interfaces are saturated; they're pushing approximately 40-50 megabytes / sec over gig links.
My gut feeling is that there shouldn't be any errors at all over any interface if everything is working properly but I'm worried that if I pick a fight over resolving these problems I'll just alienate everyone else who believes "it works fine; it's been working for years that way".
Anyone have some good stories / studies / statistics for when to be alarmed at ethernet errors? Or something to indicate how a small volume of errors would affect, say, an iSCSI volume?
Thanks!