4

I have a situation where I need to monitor (with Zabbix) if a rsync job failed to execute.

I though about inserting an exit code on a file at the source and monitoring that but I haven't found a good way of doing this.

Anyone have any idea of a method I can perform this monitoring?

hotzst
  • 7,238
  • 9
  • 41
  • 64
AlimaSP
  • 43
  • 1
  • 3

2 Answers2

7

I solved this doing 3 thing.

1 - Create a script to execute the rsync on cron

#!/bin/bash +x
# Put your own rsync command on line below 
rsync -rlptv --delete-after root@serverA:/some_dir/ /another_dir/ > /lalla_dir/my.log

# Check if rsync was executed with success
if [ $? = 0 ];then
# If true, send a random number to log file and status=ok message
echo $[ 1 + $[ RANDOM % 1000 ]] >> /lalla_dir/my.log
echo "Status = OK" >> /lalla_dir/my.log
# If false, send a random number to log file and status=ERROR message
else
echo $[ 1 + $[ RANDOM % 1000 ]] >> /lalla_dir/my.log
echo "Status = ERROR" >> /lalla_dir/my.log
fi

2 - Create two Itens on Zabbix

A - Check the check_sum of my.log (that was the reason of why the script must have the Random number, that way you are sure that the log file has been modified since the last check

Zabbix key

vfs.file.cksum[]

B - Check the log file for the OK message.

Zabbix key

vfs.file.regmatch[/lalla_dir/my.log,Status = OK]

3 - Create the trigger.

{my-server:vfs.file.cksum[/lalla_dir/my.log].change()}=0
or
{my-server:vfs.file.regmatch[/lalla_dir/my.log,Status = OK].last()}=0

So, if your log file don't changed or don't show the "Status = OK" message, means they was executed with erro (failed) or it does not run (cron problem maybe)

Sorry for the bad english - use of has, have, they ... still leaves me confused

Joao Vitorino
  • 2,976
  • 3
  • 26
  • 55
0

I'm trying to use this technique to monitor some backup logs. They print out a log file like this one:

897
Status=OK,Message=

The zabbix trigger is defined like this:

{svr1.xxxx.com:vfs.file.exists[/data/logs/db-backup.log].change()}=0 or {svr1.xxxx.com:vfs.file.cksum[/data/logs/db-backup.log].change()}=0 or {svr1.xxxx.com:vfs.file.regmatch[/data/logs/db-backup.log,Status=ERROR].last()}=1

The backup script is working fine running every day at 4:10am:

jbaptiste@svr1:/data/logs$ ls -lth
total 12K
-rw-r--r-- 1 root root  23 Mar 20 04:10 db-backup.log

and zabbix checks the log file every day at 5am, but it is triggering as if there was something wrong with the backup:

Trigger: DB - Check backup last run status 
Trigger status: PROBLEM 
Trigger severity: Warning 
Trigger URL: 

Item values: 

1. Backup file exists check (svr1.xxxx.com:vfs.file.exists[/data/logs/db-backup.log]): 1 
2. Backup file checksum (svr1.xxxx.com:vfs.file.cksum[/data/logs/db-backup.log]): 1864703203 
3. Backup run status code (svr1.xxxx.com:vfs.file.regmatch[/data/logs/db-backup.log,Status=ERROR]): 0 

As you can see on the zabbix trigger, each of the checks have the expected values for a successful run, as I see it none of the trigger conditions are met so I don't the trigger shouldn't had been triggered.

The other issue that when indeed there have been something wrong with the backup after it is fixed and the backup runs fine, the trigger isn't cleared.

Does anyone see something wrong with it ?

Juancho
  • 629
  • 7
  • 17