1

So I have a quite a few workers that execute frequently ranging from daily to hourly, etc. There have been incidents where a few of them just did not execute without any signature or failure. I need to come up with a solution to track these. I thought about having a listener that logs every time a worker starts, but there's just too many workers to keep track of. A better approach would be for me to know when a worker ~did not~ run. That is more important.

I've thought about creating a table where I could add logs for when workers start execution and if the last log for that worker is too long ago (longer than the gap of time it is supposed to have) then it notifies me.

  • Assuming the job is still queued up in Redis, you should be able to check your queues and flag out jobs that have not been started for too long. – fylooi Jul 19 '19 at 12:27

2 Answers2

2

Use Dead Man's Snitch or similar. https://deadmanssnitch.com

Mike Perham
  • 21,300
  • 6
  • 59
  • 61
1

This approach should give you some ideas of how you might use the Sidekiq API to notify perhaps using a slack notifier class, you might put this in a worker and run it on some other schedule, of course if this were to fail because of resources, well that's a compounding problem. But hopefully you have some priorities in your queues.

class SlackNotifier
  require 'net/http'
  require 'uri'
  require 'openssl'
  attr_reader :params

  def initialize(params)
    @params = params
  end

  def notify
    return if ENV['SLACK_WEBHOOK'].nil?
    channel = "dev"
    uri = URI.parse ENV['SLACK_WEBHOOK']
    http = Net::HTTP.new(uri.host, uri.port)
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE unless defined?(Rails) && Rails.env.production?
    http.use_ssl = true
    request = Net::HTTP::Post.new(uri.request_uri)
    request.body = "payload={'channel': '#{channel}', 'username': 'webhookbot', 'text': '#{params[:text]}'}"
    http.request(request)
  end
end


long = Sidekiq::Queue.new('long_running')
whats_taking_so_long = long.select{|j| j.enqueued_at < 8.hours.ago }

whats_taking_so_long.each do |long|
  SlackNotifier.new(text: long.item.to_s).notify
end
lacostenycoder
  • 10,623
  • 4
  • 31
  • 48