I've set up a Proxmox VE Cluster with three nodes. Each nodes has a number of VMs running on it. I'm using the PVE Monitor Plugin to set up the hosts and services, which works fine.
My issue is that Nagios's email-sending behavior is somehow odd. Ideally, I would like to have a check once-per-minute, for both the nodes as well as all services that are running on each node.
My configuration file looks like this:
# Define the cluster itself as a host
# the command check_pve_cluster_nodes give us info
# on the member's cluster state
define host {
host_name pve-cluster
max_check_attempts 10
check_command check_pve_cluster_nodes
contact_groups admins
check_interval 1
contact_groups admins
notifications_enabled 1
}
# define openvz, qemu and storages as services of the cluster
define service{
use generic-service
host_name pve-cluster
service_description OpenVZ VMs
check_command check_pve_cluster_openvz
check_interval 1
contact_groups admins
notifications_enabled 1
}
define service{
use generic-service
host_name pve-cluster
service_description Qemu VMs
check_command check_pve_cluster_qemu
check_interval 1
contact_groups admins
notifications_enabled 1
}
define service{
use generic-service
host_name pve-cluster
service_description Storages
check_command check_pve_cluster_storage
check_interval 1
contact_groups admins
notifications_enabled 1
}
I haven't changed the time unit settings, so those should be once-per-minute checks. The Nagios Web UI is showing that a host is offline, but email notifications are sent only a couple of minutes later. Furthermore, the email content is missing the most important piece of information - which node/service exactly is in critical state:
Node down
***** Nagios *****
Notification Type: PROBLEM
Host: pve-cluster
State: DOWN
Address: pve-cluster
Info: NODES CRITICAL 2 / 3 working nodes
Date/Time: Fri Mar 6 10:48:25 CET 2015
VM down
***** Nagios *****
Notification Type: PROBLEM
Service: Qemu VMs
Host: pve-cluster
Address: pve-cluster
State: CRITICAL
Date/Time: Fri Mar 6 10:40:44 CET 2015
Additional Info:
QEMU CRITICAL 2 / 3 working VMs
How can I set up the configuration, so that hosts and services (i.e. VMs) are checked in a one-minute-interval? Ideally, re-checks for that status should be sent in 15-minute intervals after that.
Is this even the best workflow? Or is there another, better way to schedule notifications with acknowledging them?