2

My client has a Greylog2 server set up to aggregate our log files. We have several streams defined.

I'd like daily email notifications to be sent out - at a minimum "System received x errors in the last 24 hours", ideally a list of top ten most frequent errors.

Has anyone implemented anything like this before - can you provide any tips or suggestions? I saw a mention of a REST api in some forum posts, but haven't been able to find much more info...

laura
  • 2,951
  • 9
  • 44
  • 61

1 Answers1

1

At my workplace, we have configured alerting tasks based on rake tasks + crontab. This was before the alerting API in graylog2-server became available (plugin directory). We still use the rake tasks as they let us use the rails models and controllers.

The following is added to general.yaml so that we can find the stream ids.

# section in general.yaml

streamalarms:
  error_stream: 50ef145471de3516b900000d

The following is the actual rake task:

namespace :gl2rake

    # Helper method for recording how long jobs took, which is used to debug / tune the jobs.
    def monitoring_wrapper(task)
    btime = Time.now
    task_name = task.name
    task_starting(task_name)

    if block_given?
      yield
    else 
      puts "No block given to monitoring_wrapper!"
    end

    etime = Time.now
    duration = (etime - btime)
    puts "Time elapsed: #{duration} seconds"
    task_completed(task_name, duration)
  end

    desc "Send an email if a job is written to the error queue. If there are more than 5 errored jobs in the last 6 minutes then send sms"
  task :error_queue => :environment do |task|
    monitoring_wrapper(task) do

      # the streams to check
      # I have customised the configuration class so that all of the stream ids are available. This can be automated.
      streams = Configuration.streamalarm_config('error_stream', '')

      # this method has been added to app/models/configuration.rb as a convenience.
      # def self.streamalarm_config(key, default)
      #   nested_general_config :streamalarms, key, default
      # end

      # get unix epoch time of 6 minutes ago
      six_mins_ago = 6.minutes.ago

      filters = {
        # optionally apply a message filter to the stream
        :message => "\"Writing job to error queue.\"",
        :date => "from #{six_mins_ago}" 
      }

      # get the stream
      stream = Stream.find_by_id(stream_id)

      if !stream
        $stderr.puts "Invalid stream id #{stream_id}"
        next
      end

      messages = MessageGateway.all_by_quickfilter(filters, nil, {:stream_id => stream_id})

      if messages.size > 0

        #alert - jobs written to error queue
        if messages.size > 5
            # send_sms_for_stream is a custom method we wrote that hooks into an sms api.
          send_sms_for_stream("There are #{messages.size} errored job(s) in the last 6 minutes. Check email for details", 'error_queue', stream.title)
        end

        message = "There are #{messages.size} errored job(s) in the last 6 minutes. (Stream #{stream.title})\n"

        messages.each do |m|
          message += "\t#{m.message}\n"
        end

        # sends an email to our designated alerting email
        send_mail("There are  #{messages.size} errored job(s)", message, 'error_queue', stream.title)
      end
    end
  end
end

This can now be called via a cron job: eg

3-59/5 * * * * sudo rake -f /opt/graylog2-web-interface/Rakefile gl2rake:error_queue RAILS_ENV=production
aj.esler
  • 921
  • 1
  • 8
  • 18