6

I've been trying to chain Arel queries using scopes, instead of just using some long-winded logic I wrote in the controller. But the scopes are slower than just getting all the records and then sifting through them with some logic. I'm wondering, then, why scopes are better.

Here's what I'm doing:

  • a question has many answers
  • an answer belongs to one question
  • a question has a "question_type" column that I use to sort it

First, the scopes way...

in question.rb:

scope :answered, joins(:answers).order('answers.created_at desc')
scope :dogs, where(:question_type => "dogs")
scope :cats, where(:question_type => "cats")
scope :mermaids, where(:question_type => "mermaids")

in questions_controller.rb:

@dogs_recently_answered = Question.answered.dogs.uniq[0..9]
@cats_recently_answered = Question.answered.cats.uniq[0..9]
@mermaids_recently_answered = Question.answered.mermaids.uniq[0..9]

Then in the view, I cycle through those instance variables (which are now arrays containing at most 10 elements) and display the results.

Here are the times it takes to load the page (five different times):

Completed 200 OK in 535ms (Views: 189.6ms | ActiveRecord: 46.2ms)

Completed 200 OK in 573ms (Views: 186.0ms | ActiveRecord: 46.3ms)

Completed 200 OK in 577ms (Views: 189.0ms | ActiveRecord: 45.6ms)

Completed 200 OK in 532ms (Views: 182.9ms | ActiveRecord: 46.1ms)

Completed 200 OK in 577ms (Views: 186.7ms | ActiveRecord: 46.9ms)

Now, the messy controller way...

@answers = Answer.order("created_at desc")
@all_answered = []
@answers.each {|answer| @all_answered << answer.question}
@recently_answered = @all_answered.uniq
@dogs_all_answered = []
@cats_all_answered = []
@mermaids_all_answered = []
@recently_answered.each do |q|
  if q.question_type == "dogs"
    @dogs_all_answered << q
    @dogs_recently_answered = @dogs_all_answered[0..9]
  elsif q.question_type == "cats"
    @cats_all_answered << q
    @cats_recently_answered = @cats_all_answered[0..9]
  elsif q.question_type == "mermaids"
    @mermaids_all_answered << q
    @mermaids_recently_answered = @mermaids_all_answered[0..9]
  end
end

And here are the times it takes to load the page now (five different times):

Completed 200 OK in 475ms (Views: 196.5ms | ActiveRecord: 34.5ms)

Completed 200 OK in 480ms (Views: 200.4ms | ActiveRecord: 36.4ms)

Completed 200 OK in 434ms (Views: 198.2ms | ActiveRecord: 35.8ms)

Completed 200 OK in 475ms (Views: 194.2ms | ActiveRecord: 36.4ms)

Completed 200 OK in 475ms (Views: 195.0ms | ActiveRecord: 35.4ms)

So...

Aside from readability, what's to be won by honing the query with a scope? Does it eventually become quicker when there are more records?

Steve Cotner
  • 505
  • 4
  • 14

2 Answers2

5

First, I'm not sure I understand how a question can be other than unique, so I'd look at trying to remove that. I don't know the logic of your data, so that might not be applicable, but it's an extra step that you might be able to avoid.

Here's how I would approach it:

scope :answered, joins(:answers).order('answers.created_at desc')
scope :recent, take(10)
scope :dogs, where(:question_type => "dogs")
scope :cats, where(:question_type => "cats")
scope :mermaids, where(:question_type => "mermaids")

@dogs_recently_answered = Question.answered.dogs.recent
@cats_recently_answered = Question.answered.dogs.recent
@mermaids_recently_answered = Question.answered.dogs.recent

This shifts the TOP portion of the query to the database where it belongs rather than fetching all of the rows and then discarding all but 10. Depending on your uniquing criteria, you might also use a scope like

scope :unique, select('DISTINCT column_name')

and then you can use Question.cats.unique.recent and get it all in one fast query that leverages the relational algebra that database systems are designed for.

jxpx777
  • 3,632
  • 4
  • 27
  • 43
  • If you have 3 answers on a question, it lists the question three times. That's why it needs uniq on it. I've avoided doing select('DISTINCT ...') because it's been too tricky to select distinct questions but then order the questions by the latest *answer* on each. That's proved much easier by just calling "uniq" and turning it into an array of unique questions. – Steve Cotner Nov 30 '10 at 15:13
  • 1
    In that case, I'd say you should be joining the answers to the questions, sorting by answer descending, and grouping by question to get the most recent answer for the question. My general rule is to shift as much of the burden for this kind of thing to the database even if it means refactoring my own perspective on the problem because databases are smarter than I am and I get future optimizations for free. – jxpx777 Nov 30 '10 at 15:18
  • On second look, maybe that won't matter now that I'm doing the ordering in the first scope. But I'm not familiar with take(...), and neither is Rails: undefined method `take' for # Did you mean that to be a separately defined method? Or is it database specific? – Steve Cotner Nov 30 '10 at 15:19
  • Not sure why it's reporting that on you. See docs here: https://github.com/rails/arel Looks like that method was added [August 13, 2010](https://github.com/rails/arel/commit/b1e7bd7d026c888d4c406adc595577b7ebc84ed5) so maybe you have an outdated version? – jxpx777 Nov 30 '10 at 15:21
  • My SQL and arel skills are still very green. I've been trying to figure out "group" all day, with no luck. Could you write out the joining-sorting-grouping arel query you're thinking of? – Steve Cotner Nov 30 '10 at 15:22
  • also, why is take() preferable to limit()? Is there a difference? (And thanks so far for your helping me understand this stuff) – Steve Cotner Nov 30 '10 at 15:25
  • Even without the recent scope, PostgreSQL still won't let me chain those scopes, because it can't order by answers.created_at if I'm selecting a distinct column in the questions table (like the id or title): PGError: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list ... I jumped the gun in thinking those scopes would be all right. I guess that seems like it's going off topic from the original question, but it would establish why a scope could actually be better. – Steve Cotner Nov 30 '10 at 15:30
  • I'm on Rails 3.0.1, which was released in October if I remember right. I would have thought everything would be in there... – Steve Cotner Nov 30 '10 at 15:38
  • Yah, I think this is getting off the original topic, for which I think my answer that you're fetching too much is still the right one. Working out the SQL/Arel to get what you actually want is a little tougher, but that's why you get paid the big bucks right? :P It might take some tweaking and playing around with some raw SQL but once you do, I think the Arel translation will be pretty obvious. – jxpx777 Nov 30 '10 at 15:40
  • Ha, I wouldn't inflict my code on a friend, let alone a paying client. Or maybe that's backwards. At any rate, thanks for thinking about the question. – Steve Cotner Nov 30 '10 at 15:46
  • Question.where(:question_type => 'dogs').join('answers').on(questions[:id].eq(answers[:question_id]).select('DISTINCT answers.question_id').order('answers.created_at DESC').project('questions.title').take(10) should get you close I think. Untested code, but it should be a good jumping off point. – jxpx777 Nov 30 '10 at 15:49
  • Thanks, I'm messing around with it. On Rails 3.0.3, eq doesn't seem to be an Arel method: undefined method `eq' for # – Steve Cotner Nov 30 '10 at 17:33
  • Well, I'm too lost to make any sense out of this, but thanks again for the help. – Steve Cotner Nov 30 '10 at 17:38
  • I realize this is very late, but have you seen http://guides.rubyonrails.org/active_record_querying.html ? When I am trying to get some ActiveRecord querying done, that's my first stop if I don't remember exactly how it goes. If I don't find it there, then I go to the more comprehensive Rails docs: http://api.rubyonrails.org/classes/ActiveRecord/Base.html – jxpx777 Mar 24 '12 at 16:04
1

I think the reason the scopes are slower in this case is because they result in 3 separate database queries, whereas the other approach uses the knowledge that all three results can be satisfied by the single query you use.

Assuming that is the case, it is not surprising the scopes are doing 3 separate queries since the system doesn't know when you call the first one that you're going to call the others right afterwards. Maybe there is an optimisation strategy that would be sensible for this scenario, but I don't know that ActiveRecord implements it.

Anyway, this is one disadvantage of the scope in this particular case. I like scopes because they are clean/clear, flexible and encapsulated a named abstraction for a query. AFAICT, in many scenarios they are not appreciably slower than the equivalent direct query.

Mike Tunnicliffe
  • 10,674
  • 3
  • 31
  • 46
  • Are you sure there are 3 db hits? Often when looking at Question.answered.dogs.to_sql in the console you'll find that ActiveRecord create one SQL query before firing it off. – David Sulc Nov 30 '10 at 14:23
  • Correct, however, the scenario in the question invokes 3 separate scope chains (Question.answered.dogs.uniq[0..9], Question.answered.cats.uniq[0..9] and Question.answered.mermaids.uniq[0..9]) which are one SQL query each. – Mike Tunnicliffe Nov 30 '10 at 14:26
  • He means three database hits to instantiate the three ivars. – jxpx777 Nov 30 '10 at 14:27
  • Yes, in fact, when considering one scope chain the "system" does know all the scopes invoked and uses this information to form the SQL query. I was referring in my answer that the system doesn't know when you call Question.answered.dogs.uniq[0..9], that you are going to call Question.answered.cats.uniq[0..9] on the very next line, and so doesn't bundle the two together into a more efficient query. The "messy" solution, however, is written with the knowledge of how to satisfy everything in one go. – Mike Tunnicliffe Nov 30 '10 at 14:33
  • Ha, I didn't parse that you were also discarding a whole bunch of results with the array slice too, which as jxpx777 correctly identifies, also makes each of the queries return more data than required (data that is easily excluded at the SQL level, thus speeding up each of the queries). – Mike Tunnicliffe Nov 30 '10 at 16:58