12

I have a background job that does a map/reduce job on MongoDB. When the user sends in more data to the document, it kicks of the background job that runs on the document. If the user sends in multiple requests, it will kick off multiple background jobs for the same document, but only one really needs to run. Is there a way I can prevent multiple duplicate instances? I was thinking of creating a queue for each document and making sure it is empty before I submit a new job. Or perhaps I can set a job id somehow that is the same as my document id, and check that none exists before submitting it?

Also, I just found a sidekiq-unique-jobs gem. But the documentation is non-existent. Does this do what I want?

Jim Dagg
  • 2,044
  • 22
  • 29
Eric Seifert
  • 1,946
  • 2
  • 17
  • 31
  • Just thought of another way, I could set a field on the document that it is scheduled to be updated and have the background job clear it on completion. Then only schedule a background task if the field is clear. – Eric Seifert Feb 05 '13 at 17:46
  • How many servers are running these sidekiq jobs? – crftr Feb 05 '13 at 17:53

5 Answers5

12

My initial suggestion would be a mutex for this specific job. But as there's a chance that you may have multiple application servers working the sidekiq jobs, I would suggest something at the redis level.

For instance, use redis-semaphore within your sidekiq worker definition. An untested example:

def perform
  s = Redis::Semaphore.new(:map_reduce_semaphore, connection: "localhost")

  # verify that this sidekiq worker is the first to reach this semaphore.
  unless s.locked?

    # auto-unlocks in 90 seconds. set to what is reasonable for your worker.
    s.lock(90)
    your_map_reduce()
    s.unlock
  end
end

def your_map_reduce
  # ...
end
crftr
  • 8,488
  • 4
  • 34
  • 44
  • By the way, I used redis-mutex for this, but basically the same idea. – Eric Seifert Feb 09 '13 at 05:20
  • Excellent! I wasn't aware that a redis-mutex package was available. But I agree that a mutex would be the proper tool in this instance. – crftr Feb 09 '13 at 20:50
  • @crftr how has this solution worked out for you? Any suggestion for other having the same issue. – aks Jan 19 '18 at 18:45
6

https://github.com/krasnoukhov/sidekiq-middleware

UniqueJobs Provides uniqueness for jobs.

Usage

Example worker:

class UniqueWorker
  include Sidekiq::Worker

  sidekiq_options({
    # Should be set to true (enables uniqueness for async jobs)
    # or :all (enables uniqueness for both async and scheduled jobs)
    unique: :all,

    # Unique expiration (optional, default is 30 minutes)
    # For scheduled jobs calculates automatically based on schedule time and expiration period
    expiration: 24 * 60 * 60
  })

  def perform
    # Your code goes here
  end
end
Todd
  • 195
  • 2
  • 11
3

There also is https://github.com/mhenrixon/sidekiq-unique-jobs (SidekiqUniqueJobs).

joost
  • 6,549
  • 2
  • 31
  • 36
1

You can do this, assuming you have all the jobs are getting added to Enqueued bucket.

class SidekiqUniqChecker
  def self.perform_unique_async(action, model_name, id)
    key = "#{action}:#{model_name}:#{id}"
    queue = Sidekiq::Queue.new('elasticsearch')
    queue.each { |q| return if q.args.join(':') == key }
    Indexer.perform_async(action, model_name, id)
  end
end

The above code is just a sample, but you may tweak it to your needs.

Source

bragboy
  • 34,892
  • 30
  • 114
  • 171
0

Create this class and run it as scheduled job (every 1 minute) that scans the queue and remove duplicated. This only works with Sidekiq.

rake task

namespace :dev do
     task remove_duplicated_jobs: :environment do 
           JobDuplicated.new.jobs.each(&:delete)
      end
  end

/lib/job_duplicated.rb

require 'sidekiq/api'

class JobDuplicated

    def jobs
        
        results = []

        queues.each do |queue|
            jobs = {}

            # Scansiona ogni job nella coda
            queue.each do |job|

                job_name = JSON.parse(job.value)['wrapped']
                arguments = JSON.parse(job.value)['args'][0]['arguments']

                jid = job.jid
                key = [job_name, arguments]

                # Se ho già un job con questo nome e argomenti altrimenti lo aggiungo
                if jobs[key]
                    results << job
                    #job.delete
                else
                    jobs[key] = jid
                end
            end
        end

        results
    end

    private 

        def queues
            Sidekiq::Queue.all
        end
end
sparkle
  • 7,530
  • 22
  • 69
  • 131