I'm using Zabbix 3.0 to monitor our main switch in which the port from the firewall as well as other ports show what appear to be drops of either sending or receiving. One would conclude that if this traffic was dropping from the switch could be over utilized that the drops would be consistent across all ports, but it's not. Very intermittently we do have external communication issues on a VM downloading data from a client that get's hung but the timings between that symptom and the data being read by Zabbix in which I would say hte possibility of our source for downloading could be experiencing issues, or us but of course our ISP denies any outages and has shown that our drops coming in have been solid during those time periods.
The first image shows Drops on incoming port to switch from the ASA. NOTE: from 4:55AM-5:01AM it shows nothing outgoing and the value should be a straight 0. At this same time the random port also shows this loss, However from 5:34AM-5:41AM the incoming shows loss from the ASA to the switch, however the same random port shows no loss. Finally from a VM running the Zabbix client it shows communication never going down at all.
This image shows random loss and is from a random port on the Cisco SG200-50 switch.
A random VM shows no loss at all during this time period.
Cisco support is baffled because if it was a bad switch they believe it would be a recognizable pattern, however have not ruled out possible CPU spikes in the switch when this occurs since I was told that can't pull that data from the switch since it's a "small business class" switch.
Other considerations:
ports are not configured in LAG since the Hyper-V hosts are configured NIC teams are configured as they are configured in Switch independent mode on a Windows Server 2012 R2 host. Below is a screenshot of the configuration using powershell running the Get-NetLbfoTeam cmdlet.
During the same time I see no drop in communication over a VPN tunnel to a remote site monitoring a wireless access point and the Zabbix server has to talk through the port on the switch that is connected to the ASA to reach the WAP.
When I asked Cisco if this should be changed since this switch is LAG capable they said it shouldn't make a difference to how I'm seeing the possible data drops within Zabbix. I looked within various Zabbix forums but was unable to find anything to try changing but I did tune the server for more concurrent connections to possibly eliminate something with Zabbix that could be causing innacurrate readings. Within Zabbix I'm currently monitoring less than a dozen nodes. I am confident that VM is sufficiently powered with 4 cores and 4GB of ram on a underutilized host with low disk I/O. When I look at the Zabbix server utilization it appears very underutilized below is a breief snapshot of the Zabbix 3.0 Server. NOTE: I am runnign the appliance instead of buildign from scratch for two reasons. One Just to test drive the product, and two with a low number of items being monitored it should have no issue monitoring 5 items.
CPU is idle 99.4% of the time
CPU spikes are less than 1%
Memory usage is roughly 70-75%
Network traffic us usually below 50Kbps with small spikes to around 240Kbps
100% free swap space
Zabbix value cache hits 7.21-7.46
No cache misses during the SNMP losses
Below is my Zabbix server config file /etc/zabbix/zabbix_server.conf
# This is a configuration file for Zabbix server daemon
# To get more information about Zabbix, visit http://www.zabbix.com
############ GENERAL PARAMETERS #################
### Option: ListenPort
# Listen port for trapper.
#
# Mandatory: no
# Range: 1024-32767
# Default:
# ListenPort=10051
### Option: SourceIP
# Source IP address for outgoing connections.
#
# Mandatory: no
# Default:
# SourceIP=
### Option: LogType
# Specifies where log messages are written to:
# system - syslog
# file - file specified with LogFile parameter
# console - standard output
#
# Mandatory: no
# Default:
# LogType=file
### Option: LogFile
# Log file name for LogType 'file' parameter.
#
# Mandatory: no
# Default:
# LogFile=
LogFile=/var/log/zabbix/zabbix_server.log
### Option: LogFileSize
# Maximum size of log file in MB.
# 0 - disable automatic log rotation.
#
# Mandatory: no
# Range: 0-1024
# Default:
# LogFileSize=1
LogFileSize=0
### Option: DebugLevel
# Specifies debug level:
# 0 - basic information about starting and stopping of Zabbix processes
# 1 - critical information
# 2 - error information
# 3 - warnings
# 4 - for debugging (produces lots of information)
# 5 - extended debugging (produces even more information)
#
# Mandatory: no
# Range: 0-5
# Default:
# DebugLevel=3
### Option: PidFile
# Name of PID file.
#
# Mandatory: no
# Default:
# PidFile=/tmp/zabbix_server.pid
PidFile=/var/run/zabbix/zabbix_server.pid
### Option: DBHost
# Database host name.
# If set to localhost, socket is used for MySQL.
# If set to empty string, socket is used for PostgreSQL.
#
# Mandatory: no
# Default:
# DBHost=localhost
### Option: DBName
# Database name.
# For SQLite3 path to database file must be provided. DBUser and DBPassword are ignored.
#
# Mandatory: yes
# Default:
# DBName=
DBName=zabbix
### Option: DBSchema
# Schema name. Used for IBM DB2 and PostgreSQL.
#
# Mandatory: no
# Default:
# DBSchema=
### Option: DBUser
# Database user. Ignored for SQLite.
#
# Mandatory: no
# Default:
# DBUser=
DBUser=zabbix
### Option: DBPassword
# Database password. Ignored for SQLite.
# Comment this line if no password is used.
#
# Mandatory: no
# Default:
# DBPassword=
DBPassword=gv5aLv2OKy
### Option: DBSocket
# Path to MySQL socket.
#
# Mandatory: no
# Default:
# DBSocket=/tmp/mysql.sock
### Option: DBPort
# Database port when not using local socket. Ignored for SQLite.
#
# Mandatory: no
# Range: 1024-65535
# Default (for MySQL):
# DBPort=3306
############ ADVANCED PARAMETERS ################
### Option: StartPollers
# Number of pre-forked instances of pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollers=5
StartPollers=10
### Option: StartIPMIPollers
# Number of pre-forked instances of IPMI pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartIPMIPollers=0
### Option: StartPollersUnreachable
# Number of pre-forked instances of pollers for unreachable hosts (including IPMI and Java).
# At least one poller for unreachable hosts must be running if regular, IPMI or Java pollers
# are started.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollersUnreachable=1
StartPollersUnreachable=2
### Option: StartTrappers
# Number of pre-forked instances of trappers.
# Trappers accept incoming connections from Zabbix sender, active agents and active proxies.
# At least one trapper process must be running to display server availability and view queue
# in the frontend.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartTrappers=5
StartTrappers=10
### Option: StartPingers
# Number of pre-forked instances of ICMP pingers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPingers=1
StartPingers=2
### Option: StartDiscoverers
# Number of pre-forked instances of discoverers.
#
# Mandatory: no
# Range: 0-250
# Default:
# StartDiscoverers=1
StartDiscoverers=8
### Option: StartHTTPPollers
# Number of pre-forked instances of HTTP pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartHTTPPollers=1
StartHTTPPollers=10
### Option: StartTimers
# Number of pre-forked instances of timers.
# Timers process time-based trigger functions and maintenance periods.
# Only the first timer process handles the maintenance periods.
#
# Mandatory: no
# Range: 1-1000
# Default:
# StartTimers=1
### Option: StartEscalators
# Number of pre-forked instances of escalators.
#
# Mandatory: no
# Range: 0-100
# Default:
# StartEscalators=1
### Option: JavaGateway
# IP address (or hostname) of Zabbix Java gateway.
# Only required if Java pollers are started.
#
# Mandatory: no
# Default:
# JavaGateway=
JavaGateway=127.0.0.1
### Option: JavaGatewayPort
# Port that Zabbix Java gateway listens on.
#
# Mandatory: no
# Range: 1024-32767
# Default:
# JavaGatewayPort=10052
### Option: StartJavaPollers
# Number of pre-forked instances of Java pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartJavaPollers=0
StartJavaPollers=5
### Option: StartVMwareCollectors
# Number of pre-forked vmware collector instances.
#
# Mandatory: no
# Range: 0-250
# Default:
# StartVMwareCollectors=0
### Option: VMwareFrequency
# How often Zabbix will connect to VMware service to obtain a new data.
#
# Mandatory: no
# Range: 10-86400
# Default:
# VMwareFrequency=60
### Option: VMwarePerfFrequency
# How often Zabbix will connect to VMware service to obtain performance data.
#
# Mandatory: no
# Range: 10-86400
# Default:
# VMwarePerfFrequency=60
### Option: VMwareCacheSize
# Size of VMware cache, in bytes.
# Shared memory size for storing VMware data.
# Only used if VMware collectors are started.
#
# Mandatory: no
# Range: 256K-2G
# Default:
# VMwareCacheSize=8M
### Option: VMwareTimeout
# Specifies how many seconds vmware collector waits for response from VMware service.
#
# Mandatory: no
# Range: 1-300
# Default:
# VMwareTimeout=10
### Option: SNMPTrapperFile
# Temporary file used for passing data from SNMP trap daemon to the server.
# Must be the same as in zabbix_trap_receiver.pl or SNMPTT configuration file.
#
# Mandatory: no
# Default:
# SNMPTrapperFile=/tmp/zabbix_traps.tmp
SNMPTrapperFile=/var/log/zabbix/snmptrapfmt.log
### Option: StartSNMPTrapper
# If 1, SNMP trapper process is started.
#
# Mandatory: no
# Range: 0-1
# Default:
# StartSNMPTrapper=0
StartSNMPTrapper=1
### Option: ListenIP
# List of comma delimited IP addresses that the trapper should listen on.
# Trapper will listen on all network interfaces if this parameter is missing.
#
# Mandatory: no
# Default:
# ListenIP=0.0.0.0
# ListenIP=127.0.0.1
### Option: HousekeepingFrequency
# How often Zabbix will perform housekeeping procedure (in hours).
# Housekeeping is removing outdated information from the database.
# To prevent Housekeeper from being overloaded, no more than 4 times HousekeepingFrequency
# hours of outdated information are deleted in one housekeeping cycle, for each item.
# To lower load on server startup housekeeping is postponed for 30 minutes after server start.
# With HousekeepingFrequency=0 the housekeeper can be only executed using the runtime control option.
# In this case the period of outdated information deleted in one housekeeping cycle is 4 times the
# period since the last housekeeping cycle, but not less than 4 hours and not greater than 4 days.
#
# Mandatory: no
# Range: 0-24
# Default:
# HousekeepingFrequency=1
### Option: MaxHousekeeperDelete
# The table "housekeeper" contains "tasks" for housekeeping procedure in the format:
# [housekeeperid], [tablename], [field], [value].
# No more than 'MaxHousekeeperDelete' rows (corresponding to [tablename], [field], [value])
# will be deleted per one task in one housekeeping cycle.
# SQLite3 does not use this parameter, deletes all corresponding rows without a limit.
# If set to 0 then no limit is used at all. In this case you must know what you are doing!
#
# Mandatory: no
# Range: 0-1000000
# Default:
# MaxHousekeeperDelete=5000
### Option: SenderFrequency
# How often Zabbix will try to send unsent alerts (in seconds).
#
# Mandatory: no
# Range: 5-3600
# Default:
# SenderFrequency=30
### Option: CacheSize
# Size of configuration cache, in bytes.
# Shared memory size for storing host, item and trigger data.
#
# Mandatory: no
# Range: 128K-8G
# Default:
# CacheSize=8M
CacheSize=1G
### Option: CacheUpdateFrequency
# How often Zabbix will perform update of configuration cache, in seconds.
#
# Mandatory: no
# Range: 1-3600
# Default:
# CacheUpdateFrequency=60
### Option: StartDBSyncers
# Number of pre-forked instances of DB Syncers.
#
# Mandatory: no
# Range: 1-100
# Default:
# StartDBSyncers=4
### Option: HistoryCacheSize
# Size of history cache, in bytes.
# Shared memory size for storing history data.
#
# Mandatory: no
# Range: 128K-2G
# Default:
# HistoryCacheSize=16M
HistoryCacheSize=256M
### Option: HistoryIndexCacheSize
# Size of history index cache, in bytes.
# Shared memory size for indexing history cache.
#
# Mandatory: no
# Range: 128K-2G
# Default:
# HistoryIndexCacheSize=4M
### Option: TrendCacheSize
# Size of trend cache, in bytes.
# Shared memory size for storing trends data.
#
# Mandatory: no
# Range: 128K-2G
# Default:
# TrendCacheSize=4M
TrendCacheSize=512M
### Option: ValueCacheSize
# Size of history value cache, in bytes.
# Shared memory size for caching item history data requests.
# Setting to 0 disables value cache.
#
# Mandatory: no
# Range: 0,128K-64G
# Default:
# ValueCacheSize=8M
ValueCacheSize=32M
### Option: Timeout
# Specifies how long we wait for agent, SNMP device or external check (in seconds).
#
# Mandatory: no
# Range: 1-30
# Default:
# Timeout=3
Timeout=15
### Option: TrapperTimeout
# Specifies how many seconds trapper may spend processing new data.
#
# Mandatory: no
# Range: 1-300
# Default:
# TrapperTimeout=300
### Option: UnreachablePeriod
# After how many seconds of unreachability treat a host as unavailable.
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnreachablePeriod=45
### Option: UnavailableDelay
# How often host is checked for availability during the unavailability period, in seconds.
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnavailableDelay=60
### Option: UnreachableDelay
# How often host is checked for availability during the unreachability period, in seconds.
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnreachableDelay=15
### Option: AlertScriptsPath
# Full path to location of custom alert scripts.
# Default depends on compilation options.
#
# Mandatory: no
# Default:
# AlertScriptsPath=${datadir}/zabbix/alertscripts
AlertScriptsPath=/usr/lib/zabbix/alertscripts
### Option: ExternalScripts
# Full path to location of external scripts.
# Default depends on compilation options.
#
# Mandatory: no
# Default:
# ExternalScripts=${datadir}/zabbix/externalscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
### Option: FpingLocation
# Location of fping.
# Make sure that fping binary has root ownership and SUID flag set.
#
# Mandatory: no
# Default:
# FpingLocation=/usr/sbin/fping
FpingLocation=/usr/bin/fping
### Option: Fping6Location
# Location of fping6.
# Make sure that fping6 binary has root ownership and SUID flag set.
# Make empty if your fping utility is capable to process IPv6 addresses.
#
# Mandatory: no
# Default:
# Fping6Location=/usr/sbin/fping6
Fping6Location=/usr/bin/fping6
### Option: SSHKeyLocation
# Location of public and private keys for SSH checks and actions.
#
# Mandatory: no
# Default:
# SSHKeyLocation=
### Option: LogSlowQueries
# How long a database query may take before being logged (in milliseconds).
# Only works if DebugLevel set to 3, 4 or 5.
# 0 - don't log slow queries.
#
# Mandatory: no
# Range: 1-3600000
# Default:
# LogSlowQueries=0
LogSlowQueries=3000
### Option: TmpDir
# Temporary directory.
#
# Mandatory: no
# Default:
# TmpDir=/tmp
### Option: StartProxyPollers
# Number of pre-forked instances of pollers for passive proxies.
#
# Mandatory: no
# Range: 0-250
# Default:
# StartProxyPollers=1
### Option: ProxyConfigFrequency
# How often Zabbix Server sends configuration data to a Zabbix Proxy in seconds.
# This parameter is used only for proxies in the passive mode.
#
# Mandatory: no
# Range: 1-3600*24*7
# Default:
# ProxyConfigFrequency=3600
### Option: ProxyDataFrequency
# How often Zabbix Server requests history data from a Zabbix Proxy in seconds.
# This parameter is used only for proxies in the passive mode.
#
# Mandatory: no
# Range: 1-3600
# Default:
# ProxyDataFrequency=1
### Option: AllowRoot
# Allow the server to run as 'root'. If disabled and the server is started by 'root', the server
# will try to switch to the user specified by the User configuration option instead.
# Has no effect if started under a regular user.
# 0 - do not allow
# 1 - allow
#
# Mandatory: no
# Default:
# AllowRoot=0
### Option: User
# Drop privileges to a specific, existing user on the system.
# Only has effect if run as 'root' and AllowRoot is disabled.
#
# Mandatory: no
# Default:
# User=zabbix
### Option: Include
# You may include individual files or all files in a directory in the configuration file.
# Installing Zabbix will create include directory in /usr/local/etc, unless modified during the compile time.
#
# Mandatory: no
# Default:
# Include=
# Include=/usr/local/etc/zabbix_server.general.conf
# Include=/usr/local/etc/zabbix_server.conf.d/
# Include=/usr/local/etc/zabbix_server.conf.d/*.conf
### Option: SSLCertLocation
# Location of SSL client certificates.
# This parameter is used only in web monitoring.
#
# Mandatory: no
# Default:
# SSLCertLocation=${datadir}/zabbix/ssl/certs
### Option: SSLKeyLocation
# Location of private keys for SSL client certificates.
# This parameter is used only in web monitoring.
#
# Mandatory: no
# Default:
# SSLKeyLocation=${datadir}/zabbix/ssl/keys
### Option: SSLCALocation
# Override the location of certificate authority (CA) files for SSL server certificate verification.
# If not set, system-wide directory will be used.
# This parameter is used only in web monitoring and SMTP authentication.
#
# Mandatory: no
# Default:
# SSLCALocation=
####### LOADABLE MODULES #######
### Option: LoadModulePath
# Full path to location of server modules.
# Default depends on compilation options.
#
# Mandatory: no
# Default:
# LoadModulePath=${libdir}/modules
### Option: LoadModule
# Module to load at server startup. Modules are used to extend functionality of the server.
# Format: LoadModule=<module.so>
# The modules must be located in directory specified by LoadModulePath.
# It is allowed to include multiple LoadModule parameters.
#
# Mandatory: no
# Default:
# LoadModule=
####### TLS-RELATED PARAMETERS #######
### Option: TLSCAFile
# Full pathname of a file containing the top-level CA(s) certificates for
# peer certificate verification.
#
# Mandatory: no
# Default:
# TLSCAFile=
### Option: TLSCRLFile
# Full pathname of a file containing revoked certificates.
#
# Mandatory: no
# Default:
# TLSCRLFile=
### Option: TLSCertFile
# Full pathname of a file containing the server certificate or certificate chain.
#
# Mandatory: no
# Default:
# TLSCertFile=
### Option: TLSKeyFile
# Full pathname of a file containing the server private key.
#
# Mandatory: no
# Default:
# TLSKeyFile=
Through deduction I arrive back at the switch, however how do I get enough data for Cisco to replace the switch or address whatever the root cause is in their next update. The switch is running Firmware 1.4.0.88 and Boot 1.3.5.06 and Cisco support did not state that any specific update(s) would specifically address the accuracy of the SNMP data from the switch. Besides updating the firmware which I'm going to do during our next scheduled outage for updates to other systems anyways I'll be doing this as well. Any suggestions to help determine this root cause?