13

I have some update triggers which push jobs onto the Sidekiq queue. So in some cases, there can be multiple jobs to process the same object.

There are a couple of uniqueness plugins ("Middleware", Unique Jobs), they're not documented much, but they seem to be more like throttlers to prevent repeat processing; what I want is a throttler that prevents repeat creating of the same jobs. That way, an object will always be processed in its freshest state. Is there a plugin or technique for this?


Update: I didn't have time to make a middleware, but I ended up with a related cleanup function to ensure queues are unique: https://gist.github.com/mahemoff/bf419c568c525f0af903

mahemoff
  • 44,526
  • 36
  • 160
  • 222
  • Not to troll but one of the assumptions of Sidekiq is that the job is idempotent which is exactly the problem you're complaining about. – engineerDave Feb 02 '14 at 05:22
  • I'm not worried about a repeat job causing some unwanted consequence; I'm worried about performance. Identical jobs means wasted cycles. e.g. If an object is changed and a job added to the queue, then the object changes again while the job is still on the queue, there's no point of executing both identical jobs. – mahemoff Feb 03 '14 at 10:58
  • Is that intuition telling you its an optimization problem or a benchmarks proving a performance bottleneck? As Sidekiq runs its jobs concurrently, and in a non blocking fashion, the jobs are executed in parallel with little overhead. Doing an operation to find the unique jobs may chew up more cycles or cause a blocking operation that would slow you down more than a few duplicate operations executing in threads. Again you never know until you have benchmarks. Either way, I wish you luck! – engineerDave Feb 03 '14 at 15:06
  • Thanks Dave! When you say "little overhead", you're referring to Sidekiq's effort, but if the job itself requires substantial network activity and grunt work, the savings can be huge. I mean there's a reason why these jobs are being deferred after all, some of them can be heavy. – mahemoff Feb 03 '14 at 15:31
  • Sorry for any confusion, by little overhead I meant low memory profile and non blocking in the context that it's a background operation. – engineerDave Feb 03 '14 at 15:39

4 Answers4

8

What about a simple client middleware?

module Sidekiq
  class UniqueMiddleware

    def call(worker_class, msg, queue_name, redis_pool)
      if msg["unique"]
        queue = Sidekiq::Queue.new(queue_name)
        queue.each do |job|
          if job.klass == msg['class'] && job.args == msg['args']
            return false
          end
        end
      end

      yield

    end
  end
end

Just register it

  Sidekiq.configure_client do |config|
    config.client_middleware do |chain|
      chain.add Sidekiq::UniqueMiddleware
    end
  end

Then in your job just set unique: true in sidekiq_options when needed

Paté
  • 1,914
  • 2
  • 22
  • 33
  • 1
    This is not a good solution IMO. The time complexity of this is O(n) and the whole point of having a background job processor is to not delay the execution of your main thread. This middleware however could become a performance bottleneck depending on how big your queue is. – Hamed Jun 23 '16 at 16:49
  • Completely agree. This is only an example for a small queue. Any big queue would need to use a hash lookup based on job args instead of this iterative approach. – Paté Jun 24 '16 at 18:42
3

Take a look at this: https://github.com/mhenrixon/sidekiq-unique-jobs

It's sidekiq with unique jobs added

Freddy Wetson
  • 456
  • 1
  • 4
  • 23
  • I mentioned that in the question. – mahemoff Jan 31 '14 at 13:35
  • It's very well documented, detailing pretty much exactly how to do what you are after – Freddy Wetson Jan 31 '14 at 13:36
  • It looks to me like it checks for uniqueness at processing-time, not enqueueing-time. But the docs don't make it clear. – mahemoff Jan 31 '14 at 13:39
  • We use sidekiq-unique-jobs 2.7.0 and it works at enqueuing time. Just set `unique_job_expiration` in your worker's `sidekiq_options` and set it to a value that is multiple of the average execution time of the job. Example: your job is scheduled every minute and it takes 20 seconds to complete, use `sidekiq_options queue: unique, unique_job_expiration: 40` This way if Sidekiq tries to re-enqueue the job in those 40 seconds it will have no effect. – aledalgrande Feb 06 '14 at 00:05
  • 1
    @aledalgrande thanks but time-based expiry is not quite the need I have. I want a way to check if the job exists when Sidekiq tries to enqueue, and if so, do nothing; but if it doesn't exist, enqueue it. I'm guessing a plugin might need to maintain a hash of all jobs in order to accomplish it efficiently. – mahemoff Feb 06 '14 at 18:21
  • @mahemoff I basically need the same thing, what was your final solution? – Jacka Jan 15 '17 at 21:07
  • @jacka I didn't find one other than a function to loop through and wipe out duplicates I put in the answer. It could be run periodically and since this is Redis, will probably not be a big performance hit. The ideal though would be middleware maintaining a separate Set structure of jobs so every time the client tries to queue, the job is intercepted and discarded if it matches what's in the set. – mahemoff Mar 17 '18 at 17:24
3

My suggestion is to search for prior scheduled jobs based on some select criteria and delete, before scheduling a new one. This has been useful for me when i want a single scheduled job for a particular Object, and/or one of its methods.

Some example methods in this context:

 find_jobs_for_object_by_method(klass, method)

  jobs = Sidekiq::ScheduledSet.new

  jobs.select { |job|
    job.klass == 'Sidekiq::Extensions::DelayedClass' &&
        ((job_klass, job_method, args) = YAML.load(job.args[0])) &&
        job_klass == klass &&
        job_method == method
  }

end

##
# delete job(s) specific to a particular class,method,particular record
# will only remove djs on an object for that method
#
def self.delete_jobs_for_object_by_method(klass, method, id)

  jobs = Sidekiq::ScheduledSet.new
  jobs.select do |job|
    job.klass == 'Sidekiq::Extensions::DelayedClass' &&
        ((job_klass, job_method, args) = YAML.load(job.args[0])) &&
        job_klass == klass &&
        job_method == method  &&
        args[0] == id
  end.map(&:delete)

end

##
# delete job(s) specific to a particular class and particular record
# will remove any djs on that Object
#
def self.delete_jobs_for_object(klass, id)

  jobs = Sidekiq::ScheduledSet.new
  jobs.select do |job|
    job.klass == 'Sidekiq::Extensions::DelayedClass' &&
        ((job_klass, job_method, args) = YAML.load(job.args[0])) &&
        job_klass == klass &&
        args[0] == id
  end.map(&:delete)

end
blotto
  • 3,387
  • 1
  • 19
  • 19
  • Thanks, it's not quite a complete answer, but I think it's the closest towards a strategy for this. – mahemoff Feb 06 '14 at 21:31
0

Maybe you could use Queue Classic which enqueues jobs on a Postgres database (in a really open way), so it could be extended (open-source) to check for uniqueness before doing so.

rlecaro2
  • 755
  • 7
  • 14