1

Zabbix version: 3.0.3 (zabbix-server-mysql)

OS: Ubuntu 14.04 Trusty

Number of hosts (enabled/disabled/templates): 28 / 0 / 57

Number of items (enabled/disabled/not supported): 1349 / 161 / 47

Number of triggers (enabled/disabled): 902 / 39

Required server performance, new values per second: 22.86

Zabbix server config:

StartPollers=5 StartPollersUnreachable=2 StartTrappers=5 StartDiscoverers=3 StartHTTPPollers=5

I have template with 3 items like this: net.tcp.port[<IP>,3128]. Template is applied to 10 servers.

Here is problem: when I enable this items, events like zabbix-agent on <hostname> is not available for 2 minutes start to randomly appear on 10 hosts where template is applied. Values on graph "Zabbix Server Preformance" (that represents ), representing zabbix[wcache,values], start going down from 19-19.5 to 16-17. Values representing zabbix[queue] stay at 0 as before.

When I disable items, problem disappears.

Zabbix server is not overloaded by I/O or CPU, there is plenty of free memory. Doesn't seem as hardware performance issue. Zabbix agents on hosts are available, I check it with nc -vz <hostname> 10050.

Nothing abnormal appears in server log or agents logs on this 10 hosts.

I tried increasing ulimit -n for zabbix server process, it was increased: cat /proc/<zabbix_worker_pid>/limits now shows Max open files 10240 10240 files. Didn't help.

I tried increasing number of StartPollers to 10 and 15 - didn't help either.

What is happening to server?

UPD:

Items type: Zabbix agent

All systems are rinning Linux ubuntu 14.04 trusty

Agents on hosts run 3 listeners, 1 collector and 1 active checks process.

For 7 of this 10 hosts zabbix_get -s <host> -t net.tcp.port[<IP>,3128] works instantly for all 3 items, on other 3 hosts it works for about 3 seconds and returns 0(monitored IPs are not available from that 3 hosts).

Selivanov Pavel
  • 2,206
  • 3
  • 26
  • 48
  • What's the item type ? What's the busy rate for pollers and unreachable pollers ? How many listeners are started on each agent ? – Richlv May 31 '16 at 13:01
  • If you use `zabbix_get`, how long does it take for agents to process `net.tcp.port[]` items? Are these agents running on Windows or a Unix-like system? – asaveljevs May 31 '16 at 13:06
  • @asaveljevs: updated description. – Selivanov Pavel May 31 '16 at 17:05
  • @Richlv: updated description. How can I check busy rate for pollers and unreachable pollers? – Selivanov Pavel May 31 '16 at 17:08
  • To check process busyness, use `zabbix[process,poller,avg,busy]` and `zabbix[process,unreachable poller,avg,busy]` internal items (see https://www.zabbix.com/documentation/3.0/manual/config/items/itemtypes/internal). They are included in the default "Template App Zabbix Server" that ships with Zabbix. – asaveljevs Jun 01 '16 at 05:08
  • I reported problem in zabbix tracker: https://support.zabbix.com/browse/ZBX-10868 – Selivanov Pavel Jun 01 '16 at 16:16

1 Answers1

1

Finally:

If:

  • timeout on both agent and server are the same (default: timeout = 3)
  • there is item net.tcp.port[<IP>,<port>] and trigger using it
  • pair [<IP>,<port>] is unavailable by TCP timeout

Then:

"Zabbix-agent on {HOST.NAME} is unawailable" ( trigger expression: {agent.ping.nodata(2m)} = 1 ) start spawning on hosts with this item. Not the trigger for specific item, but the trigger for the agent availability. This is bug, but zabbix guys do not seem to agree:

https://support.zabbix.com/browse/ZBX-10868

Zabbix version 3.0.3 for both server and agent.

Possible workarounds:

  • make Timeout in zabbix_server.conf more than in zabbix_agentd.conf
  • use UserParameter like this: UserParameter=tcp_connect_check[*], /bin/nc -z "$1" "$2" -w "$3"; echo $? and create items connect timeout less than in zabbix_agentd.conf. To avoid securely problems, do not enable UnsafeUserParameters in zabbix_agentd.conf
Selivanov Pavel
  • 2,206
  • 3
  • 26
  • 48