0

How could I de-dupe all Sidekiq queues, ie ensure each job in the queue has unique worker and arguments.

(This arises because, for example, an object is saved twice, triggering some new job each time; but we only want it to be processed. So I'm looking to periodically de-dupe queues.)

mahemoff
  • 44,526
  • 36
  • 160
  • 222

1 Answers1

1

You can use sidekiq unique jobs gem - looks like it actually does what you need.

Added later:

Here is basic implementation of what you are asking for - it would not be fast, but should be OK for small queues. I've also met this problem when repacking JSON - in my environment it was necessary to re-encode json the same way.

#for proper json packing (I had an issue with it while testing)
require 'bigdecimal'

class BigDecimal
  def as_json(options = nil) #:nodoc:
    if finite?
      self
    else
      NilClass::AS_JSON
    end
  end
end

Sidekiq.redis do |connection|
  # getting items from redis
  items_count = connection.llen('queue:background')
  items = connection.lrange('queue:background', 0, 100)

  # remove retrieved items
  connection.lrem('queue:background', 0, 100)

  # jobs are in json - decode them
  items_decoded = items.map{|item| ActiveSupport::JSON.decode(item)}

  # group them by class and arguments
  grouped = items_decoded.group_by{|item| [item['class'], item['args']]}

  # get second and so forth from each group
  duplicated = grouped.values.delete_if{|mini_list| mini_list.length < 2}
  for_deletion = duplicated.map{|a| a[0...-1]}.flatten
  for_deletion_packed = for_deletion.map{|item| JSON.generate(item)}

  # removing duplicates one by one
  for_deletion_packed.each do |packed_item|
    connection.lrem('queue:background', 0, packed_item)
  end
end
Community
  • 1
  • 1
brain-geek
  • 185
  • 9
  • That only avoids running the same job twice and is time-based, I'm trying to clean up existing queues by removing dupes. – mahemoff Feb 07 '14 at 18:37
  • But why do you want to remove duplicates instead of not creating them in the first place? In the provided example - "an object is saved twice, triggering some new job each time; but we only want it to be processed (once) ". Maybe you have some other cases? – brain-geek Feb 07 '14 at 18:58
  • If you have callback triggers set up it can be difficult to prevent the same job being added multiple times. Another example would be getting notification some extremal URL changed - you don't want it to be fetched twice. – mahemoff Feb 07 '14 at 19:02
  • I've added code to remove all duplicates in straightforward way. The other option might be considering time of when the task was enqueued and the time this task was done last time - if it was enqueued earlier then last task completed - we don't need to do it. – brain-geek Feb 07 '14 at 19:35