3

Below is a code that should be optimized:

def statistics
  blogs = Blog.where(id: params[:ids])
  results = blogs.map do |blog|
    {
      id: blog.id,
      comment_count: blog.blog_comments.select("DISTINCT user_id").count
    }
  end
  render json: results.to_json
end

Each SQL query cost around 200ms. If I have 10 blog posts, this function would take 2s because it runs synchronously. I can use GROUP BY to optimize the query, but I put that aside first because the task could be a third party request, and I am interested in how Ruby deals with async.

In Javascript, when I want to dispatch multiple asynchronous works and wait all of them to resolve, I can use Promise.all(). I wonder what the alternatives are for Ruby language to solve this problem.

Do I need a thread for this case? And is it safe to do that in Ruby?

sawa
  • 165,429
  • 45
  • 277
  • 381
chaintng
  • 1,325
  • 3
  • 14
  • 26

3 Answers3

4

There are multiple ways to solve this in ruby, including promises (enabled by gems).

JavaScript accomplishes asynchronous execution using an event loop and event driven I/O. There are event libraries to accomplish the same thing in ruby. One of the most popular is eventmachine.

As you mentioned, threads can also solve this problem. Thread-safety is a big topic and is further complicated by different thread models in different flavors of ruby (MRI, JRuby, etc). In summary I'll just say that of course threads can be used safely... there are just times when that is difficult. However, when used with blocking I/O (like to an API or a database request) threads can be very useful and fairly straight-forward. A solution with threads might look something like this:

# run blocking IO requests simultaneously
thread_pool = [
  Thread.new { execute_sql_1 },
  Thread.new { execute_sql_2 },
  Thread.new { execute_sql_3 },
  # ...
]

# wait for the slowest one to finish
thread_pool.each(&:join)

You also have access to other currency models, like the actor model, async classes, promises, and others enabled by gems like concurrent-ruby.

Finally, ruby concurrency can take the form of multiple processes communicating through built in mechanisms (drb, sockets, etc) or through distributed message brokers (redis, rabbitmq, etc).

Carl Zulauf
  • 39,378
  • 2
  • 34
  • 47
2

Okay, generalizing a bit:

You have a list of data data and want to operate on that data asynchronously. Assuming the operation is the same for all entries in your list, you can do this:

data = [1, 2, 3, 4] # Example data
operation = -> (data_entry) { data * 2 } # Our operation: multiply by two
results = data.map{ |e| Thread.new(e, &operation) }.map{ |t| t.value }

Taking it apart:

data = [1, 2, 3, 4]

This could be anything from database IDs to URIs. Using numbers for simplicity here.

operation = -> (data_entry) { data * 2 }

Definition of a lambda that takes one argument and does some calculation on it. This could be an API call, an SQL query or any other operation that takes some time to complete. Again, for simplicity, I'm just multiplicating the numbers by 2.

results =

This array will contain the results of all the asynchronous operations.

data.map{ |e| Thread.new(e, &operation) }...

For every entry in the data set, spawn a thread that runs operation and pass the entry as argument. This is the data_entry argument in the lambda.

...map{ |t| t.value }

Extract the value from each thread. This will wait for the thread to finish first, so by the end of this line all your data will be there.

Lambdas

Lambdas are really just glorified blocks that raise an error if you pass in the wrong number of arguments. The syntax -> (arguments) {code} is just syntactic sugar for Lambda.new { |arguments| code }.

When a method accepts a block like Thread.new { do_async_stuff_here } you can also pass a Lambda or Proc object prefixed with & and it will be treated the same way.

DarkWiiPlayer
  • 6,871
  • 3
  • 23
  • 38
  • Yeah, i think this is what i looking for. But i just wondering do we need to have Thread Pool in this case? Is there anything to worry about using thread? Some of my colleagues really dont embrace the thread and he complain that it might dangerous if we didnt understand it fully. What’s ur idea? – chaintng Apr 25 '18 at 15:50
  • You will always need some sort of thread pool if you want to wait for all threads to finish, BUT there's no need to explicitly name it. In my example it's just an intermediary state between two `.map` calls. – DarkWiiPlayer Apr 26 '18 at 06:08
  • Threads are dangerous, yes, but so is any form of parallelism if you do it incorrectly. Two threads can mess with each others data and break your program, but if you have each thread responsible of its own state, then you don't need to worry about how they interact. For more complex tasks that may not always be possible, but for running database queries in parallel it certainly is. – DarkWiiPlayer Apr 26 '18 at 06:10
1

Sure just do the count in one database call:

blogs = Blog
  .select('blogs.id, COUNT(DISTINCT blog_comments.user_id) AS comment_count')
  .joins('LEFT JOIN blog_comments ON blog_comments.blog_id = blogs.id')
  .where(comments: { id: params[:ids] })
  .group('blogs.id')

results = blogs.map do |blog|
  { id: blog.id, comment_count: blog.comment_count }
end

render json: results.to_json

You might need to change the statements depending on how your table as named in the database because I just guessed by the name of your associations.

spickermann
  • 100,941
  • 9
  • 101
  • 131
  • As i mentioned in the post. I know that i can merge into single SQL query. But i interested in dealing with async. Let say, it is not SQL, but calling to certain API Endpoint... – chaintng Apr 25 '18 at 05:47