0

Each task I have work in a short bursts, then sleep for about an hour and then work again and so on until the job is done. Some jobs may take about 10 hours to complete and there is nothing I can do about it.

What bothers me is that while job is sleeping resque worker would be busy, so if I have 4 workers and 5 jobs the last job would have to wait 10 hours until it can be processed, which is grossly unoptimal since it can work while any other worker is sleeping. Is there any way to make resque worker to process other job while current job is sleeping?

Currently I have a worker similar to this:

class ImportSongs
  def self.perform(api_token, songs)
    api = API.new api_token

    songs.each_with_index do |song, i|
      # make current worker proceed with another job while it's sleeping
      sleep 60*60  if i != 0 && i % 100 == 0

      api.import_song song
    end
  end
end
Andrew
  • 8,330
  • 11
  • 45
  • 78
  • 1
    As an alternative to sleep statements, why not schedule the import job with cron, cron via [whenever](https://github.com/javan/whenever) or [resque-scheduler](https://github.com/bvandenbos/resque-scheduler)? That way, your workers wouldn't block when not processing. – rossta Feb 03 '13 at 14:12
  • @rossta Thank you, using resque-scheduler seem to be a good idea, so I can schedule new job with the rest of the songs every hour. I will accept this as answer if you post it. – Andrew Feb 03 '13 at 16:17

2 Answers2

1

It looks like the problem you're trying to solve is API rate limiting with batch processing of the import process.

You should have one job that runs as soon as it's enqueued to enumerate all the songs to be imported. You can then break those down into groups of 100 (or whatever size you have to limit it to) and schedule a deferred job using resque-scheduler in one hour intervals.

However, if you have a hard API rate limit and you execute several of these distributed imports concurrently, you may not be able to control how much API traffic is going at once. If you have that strict of a rate limit, you may want to build a specialized process as a single point of control to enforce the rate limiting with it's own work queue.

Winfield
  • 18,985
  • 3
  • 52
  • 65
  • Yes, this is precisely what I'm trying to do. API I'm trying to access have a number of very strict policies - you can make only 3 requests per second, but not more than 10 per minute and not more than 50 per hour. For the first case I use `slowweb` gem, for the second `sleep` is enough, but waiting full hour for one thread is just too much. The only problem I see is that it would be hard to keep track of all those packages, while it's easy to use `resque-status` to track the progress of one job. – Andrew Feb 03 '13 at 20:13
1

With resque-scheduler, you'll be able to repeat discrete jobs at scheduled or delayed times as an alternative to a single, long running job that loops with sleep statements.

rossta
  • 11,394
  • 1
  • 43
  • 47