0

My rails application runs with sidekiq. The app has_many accounts. Each account can run a ImportResourceJob which sends the account_id as argument to identify the correct account to work on. I want to prevent launching concurrently many ImportResourceJobs for the same account. Basically, I want to check before launching a new ImportResourceJob that there is not a currently enqueued/running ImportResourceJob for that specific account_id.

I am a bit unsure on how to do that. I have seen answers suggesting using the scan method from the sidekiq api https://github.com/mperham/sidekiq/wiki/API#scan or https://github.com/mperham/sidekiq/wiki/API#workers

workers = Sidekiq::Workers.new
workers.size # => 2
workers.each do |process_id, thread_id, work|
  # process_id is a unique identifier per Sidekiq process
  # thread_id is a unique identifier per thread
  # work is a Hash which looks like:
  # { 'queue' => name, 'run_at' => timestamp, 'payload' => msg }
  # run_at is an epoch Integer.
  # payload is a Hash which looks like:
  # { 'retry' => true,
  #   'queue' => 'default',
  #   'class' => 'Redacted',
  #   'args' => [1, 2, 'foo'],
  #   'jid' => '80b1e7e46381a20c0c567285',
  #   'enqueued_at' => 1427811033.2067106 }
end

This doesnt seem to be very precise or realiable (only updating every 5 seconds). Also seems to me unscalable if you have a lot of workers.

Is it common/good practice to have a Jobs table with :

  • column account_id = Account has_many Jobs
  • column type = class of the job (ex: ImportResourceJob)
  • column status= enqueud, running, finished, failed

to handle those kind of things ? The idea would be to create an entry in the Jobs table before launching the job and pass the job_id to the Job. Something like this :

def launches_import_resource_job
  existing_running_job = Job.find_by(type: "ImportResourceJob", account_id: account_id, status: ["enqueued", "running"])
  return if existing_running_job

  job = Job.create(
  type: "ImportResourceJob",
  account_id: account_id,
  status: "enqueued"
  )

  ImportLmsResourcesJob.perform_later(
    account_id,
    job.id
  )
end

then in the ImportResourcesJob itself :

class ImportResourcesJob < ApplicationJob
  queue_as :default
  sidekiq_options retry: false

  def perform(account_id, job_id)
    job = Job.find(job_id)
    job.update(status: "running")
    Sync360Service.call(account_id)
    job.update(status: "finished")
    rescue Exception => e
      job.update(status: "failed")
      raise e
  end
end

What is the accepted/good solution to solve this problem ?

David Geismar
  • 3,152
  • 6
  • 41
  • 80
  • Having a table to keep track of enqueued job is an acceptable solution, depending on your architecture and if the slight increase in DB load and latency is acceptable (which in most cases it is). – Ankit Aug 01 '22 at 20:04

1 Answers1

0

@Ankit is correct in the sense that this is a strategy that will work, but a separate table isn't really necessary.

1. Use a custom queue

I see you are using the :default queue, I'd suggest a custom queue, especially if you are thinking about scaling with other jobs.

class ImportResourcesJob < ApplicationJob
  queue_as :import_resources_job
  ...
end

2. Use the Sidekiq Job ID

If you don't want to use scan, just add a column to your Account table and save JUST the Sidekiq Job ID. No need to save status as the Sidekiq job will change the status and then the value in your dB will be stale.

Save it to your Account record when the job is created, remove it from the record when the job is complete.

(since it looks like you're using ActiveJob)

class ImportResourcesJob < ApplicationJob
  queue_as :default
  sidekiq_options retry: false

  def perform(account_id)
    account = Account.find(account_id)

    account.update_column(import_resources_job_id: job.id)

    Sync360Service.call(account_id)

    rescue Exception => e
      raise e

    account.update_column(import_resources_job_id: nil)
  end
end

and to prevent the job from being created:

def launches_import_resource_job
  return unless import_resources_job_id.nil?

  ImportLmsResourcesJob.perform_later(
    account_id,
    job.id
  )
end

If you need to replicate this with multiple different jobs, I'd use a JSONB column in my table to hash the Sidekiq jobs with { #{job_name} => #{job_id} }

A NOTE ABOUT UPDATING JOB DETAILS

In your job, you do things like job.update(status: "running"). This only updates the job details in memory, not in Redis. Just be warned.

Also, Sidekiq does all the job status updates for you, so no need to do this anyway.

Chiperific
  • 4,428
  • 3
  • 21
  • 41