At my workplace, we have configured alerting tasks based on rake tasks + crontab. This was before the alerting API in graylog2-server became available (plugin directory). We still use the rake tasks as they let us use the rails models and controllers.
The following is added to general.yaml so that we can find the stream ids.
# section in general.yaml
streamalarms:
error_stream: 50ef145471de3516b900000d
The following is the actual rake task:
namespace :gl2rake
# Helper method for recording how long jobs took, which is used to debug / tune the jobs.
def monitoring_wrapper(task)
btime = Time.now
task_name = task.name
task_starting(task_name)
if block_given?
yield
else
puts "No block given to monitoring_wrapper!"
end
etime = Time.now
duration = (etime - btime)
puts "Time elapsed: #{duration} seconds"
task_completed(task_name, duration)
end
desc "Send an email if a job is written to the error queue. If there are more than 5 errored jobs in the last 6 minutes then send sms"
task :error_queue => :environment do |task|
monitoring_wrapper(task) do
# the streams to check
# I have customised the configuration class so that all of the stream ids are available. This can be automated.
streams = Configuration.streamalarm_config('error_stream', '')
# this method has been added to app/models/configuration.rb as a convenience.
# def self.streamalarm_config(key, default)
# nested_general_config :streamalarms, key, default
# end
# get unix epoch time of 6 minutes ago
six_mins_ago = 6.minutes.ago
filters = {
# optionally apply a message filter to the stream
:message => "\"Writing job to error queue.\"",
:date => "from #{six_mins_ago}"
}
# get the stream
stream = Stream.find_by_id(stream_id)
if !stream
$stderr.puts "Invalid stream id #{stream_id}"
next
end
messages = MessageGateway.all_by_quickfilter(filters, nil, {:stream_id => stream_id})
if messages.size > 0
#alert - jobs written to error queue
if messages.size > 5
# send_sms_for_stream is a custom method we wrote that hooks into an sms api.
send_sms_for_stream("There are #{messages.size} errored job(s) in the last 6 minutes. Check email for details", 'error_queue', stream.title)
end
message = "There are #{messages.size} errored job(s) in the last 6 minutes. (Stream #{stream.title})\n"
messages.each do |m|
message += "\t#{m.message}\n"
end
# sends an email to our designated alerting email
send_mail("There are #{messages.size} errored job(s)", message, 'error_queue', stream.title)
end
end
end
end
This can now be called via a cron job: eg
3-59/5 * * * * sudo rake -f /opt/graylog2-web-interface/Rakefile gl2rake:error_queue RAILS_ENV=production