0

I'm new to Rails and multithreading and am curious about how to achieve the following in the most elegant way. I couldn't find any nice tutorials which explained in detail what's the best design decision for the following task:

I have a couple of HTTP requests which will be run for a user in the background, for example, parsing a couple websites and get some information like HTTP response code, response time, then return the results. For performance reasons, I decided to split the total number of URLs to parse into batches of 25 each, then execute each batch in a thread, join these and write the result to a database.

I decided to use the following gem (http://rubygems.org/gems/thread) to ensure that there's a maximum number of threads that are run simultaneously. So far so good.

The problem is, if two users start their analysis in parallel, the maximum number of threads is two times the maximum of my threadpool.

My solution (imho) is to create a worker daemon which runs on its own and waits for jobs from the clients.

My question is, what's the best way to achieve this in Rails?

Maybe create a Rake task, and use it as a daemon (see: "Daemoninsing a rake task") and (how?) add jobs to it?

Thank you very much in advance!

Community
  • 1
  • 1
madhippie
  • 168
  • 1
  • 9

2 Answers2

0

I'd build a queue in a table in the database, and a bit of code that is periodically started by cron, which walks that table, passing requests to Typhoeus and Hydra.

Here's how the author summarizes the gem:

Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic.

As users add requests, append them to the table. You'll want fields like:

  • A "processed" field so you can tell which were handled in case the system goes down.
  • A "success" field so you can tell which requests were processed successfully, so you can retry if they failed.
  • A "retry_count" field so you can retry up to "n" times, then flag that URL as unreachable.
  • A "next_scan_time" field that says when the URL should be scanned again so you don't DOS a site by hitting it continuously.

Typhoeus and Hydra are easy to use, and do make it easy to handle multiple requests.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • Hi Tin Man, this looks very promising. I'll read into it and give you feedback tomorrow! P.S.: Thanks for correcting/clarifying my poor english. :) – madhippie Jan 09 '14 at 18:18
  • I fiddled around with this gem and am pretty satisfied. Typhoeus and Hydra matches my requirements exactly! Thank you very much. – madhippie Jan 13 '14 at 11:22
0

There are a bunch of libraries for Rails that can manage queues of long-running background jobs for you. Here are a few:

  • Sidekiq uses Redis for job storage and supports multiple worker threads.
  • Resque also uses Redis and a single worker thread.
  • delayed_job manages a job queue through ActiveRecord (or Mongoid).

Once you've chosen one, I'd recommend using Foreman to simplify launching multiple daemons at once.

Ash Wilson
  • 22,820
  • 3
  • 34
  • 45