1

I built a small web crawler implemented in two Sidekiq workers: Crawler and Parsing. The Crawler worker will seek for links while Parsing worker will read the page body.

I want to trigger an alert when the crawling/parsing of all pages is complete. Monitoring only the Crawler job is not the best solution since it may have finished but there might be several Parser jobs running.

Having a look at sidekiq-status gem it seems that I cannot dynamically add new jobs to the container for monitoring. E.g. it would be nice to have a "add" method in the following context:

@container = SidekiqStatus::Container.new

# ... for each page url found:

jid = ParserWorker.perform_async(page_url)

@container.add(jid)

The closest to this is to use "SidekiqStatus::Container.load" or "SidekiqStatus::Container.load_multi" however, it is not possible to add new jobs in the container a posteriori.

One solution would be to create as many SidekiqStatus::Container instances as the number of ParserJobs and check if all of them have status == "finished", but I wonder if a more elegant solution exists using these tools.

Any help is appreciated.

ksiomelo
  • 1,878
  • 1
  • 33
  • 38

2 Answers2

1

You are describing Sidekiq Pro's Batches feature exactly. You can spend a lot of time or some money to solve your problem.

https://github.com/mperham/sidekiq/wiki/Batches

Mike Perham
  • 21,300
  • 6
  • 59
  • 61
  • Thank you Mike, I am sure this could be easily done with Sidekiq Pro. Unfortunately $750/y is over my budget this time... – ksiomelo Aug 22 '14 at 14:13
0

OK, here's a simple solution. Using the sidekiq-status gem, the Crawler worker keeps track of the jobs IDs for the Parser jobs and halts if any Parser job is still busy (using the SidekiqStatus::Container instance to check job status).

def perform()
  # for each page....
    @jids << ParserWorker.perform_async(page_url)
  # end

  # crawler finished, parsers may still be running
  while parsers_busy?
    sleep 5 # wait 5 secs between each check
  end

  # all parsers complete, trigger notification...

end

def parsers_busy?
  status_containers = SidekiqStatus::Container.load_multi(@jids)

  for container in status_containers
    if container.status == 'waiting' || container.status == 'working'
      return true
    end
  end

  return false
end
ksiomelo
  • 1,878
  • 1
  • 33
  • 38