3

I am using Ruby on Rails 4 and I would like to understand why during the eager loading process are run further SQL queries even if data is eager loaded. That is, I have the following code that eager loads :comments the right way:

@articles = @current_user.articles.includes(:comments)

When the above code runs and I "track" what is happening in the logger with the following code:

@articles.each do |article|
  logger.debug article.comments
end

Then the logger says:

Article Load (0.4ms) SELECT ...
Comment Load (0.5ms) SELECT ... WHERE `articles`.`id` IN (...)

#<ActiveRecord::Associations::CollectionProxy [#<Comment id: 1, title: "Hello A">, #<Comment id: 2, title: "Hello B">]>

#<ActiveRecord::Associations::CollectionProxy [#<Comment id: 3, title: "Hello A">, #<Comment id: 4, title: "Hello C">]>

#<ActiveRecord::Associations::CollectionProxy [#<Comment id: 5, title: "Hello D">, #<Comment id: 6, title: "Hello E">]>

...

The above output indicates that the eager loading is working as expected: no N+1 problem since ActiveRecord::Associations::CollectionProxy objects are loaded when running article.comments.

However when I try to run code as like the following (note the find_by clause):

@articles.each do |article|
  logger.debug article.comments.find_by(:title => "Hello A")
end

Then the logger says:

Article Load (0.4ms) SELECT ...
Comment Load (0.5ms) SELECT ... WHERE `articles`.`id` IN (...)

Comment Load (0.4ms) SELECT ... AND `comments`.`title` = 'HELLO A'
#<Comment id: 1, title: "Hello A">

Comment Load (0.4ms) SELECT ... AND `comments`.`title` = 'HELLO A'
#<Comment id: 3, title: "Hello A">

Comment Load (0.4ms) SELECT ... AND `comments`.`title` = 'HELLO A'
nil

...

The above output indicates that the eager loading is not working as expected: a SQL query runs for each comment.

So, my questions/doubts are:

  1. Why in the last case the find_by clause makes the eager loading to do not work (note: it happens even in cases when I "filter" article.comments by using a clause other than find_by)?
  2. Should Ruby on Rails handle data already loaded in ActiveRecord::Associations::CollectionProxy objects as array so that it avoids to hit the database?!
  3. How can I solve the problem in order to avoid the N+1 problem in the last case?
user502052
  • 14,803
  • 30
  • 109
  • 188

2 Answers2

2

I suspect that find_by is hard-wired to make a database call.

The objects listed in the first example are of type CollectionProxy, meaning you can still make SQL queries on them. Since find_by is part of ActiveRecord, calling it on a Proxy class should go to the DB.

I suspect that if you alter your code to use an Enumerable method such as find_all on the comments collection then you should be OK, but this isn't very efficient (find_all runs in linear time)

Alternately, roll everything up into a single join query by doing something like this:

Article.joins(:comments).where(comments: {title: "My Title")

Or if you need all Articles whether they have matching comments or not, you can simply add a condition to your original includes:

Article.includes(:comments).where(comments: {title: "My Title")
David Underwood
  • 4,908
  • 1
  • 19
  • 25
  • The SQL query resulting from `@current_user.articles.includes(:comments)` is "harder" than `Article.joins(:comments).where(comments: {title: "My Title"})`. However, by using your tip `@current_user.articles.joins(:comments).where(comments: {title: "My Title"})` will produce this problem: *when a comment is not found for an article then that article is excluded by the returning array at all*. This, if I remember correctly, is a known problem in the Rails community that happens when using `joins` and `where` clauses as in my case. – user502052 Mar 04 '14 at 18:22
  • You can specify conditions on includes, but it's not recommended. It'll give you the LEFT OUTER JOIN you need (i.e. load articles regardless of the presence of matching comments: http://guides.rubyonrails.org/active_record_querying.html#specifying-conditions-on-eager-loaded-associations – David Underwood Mar 04 '14 at 18:34
  • Note that the 'missing' Articles are by design: `.join` uses an INNER JOIN while `.includes` uses a LEFT OUTER JOIN. It's more of a quirk of the implementation than a bug – David Underwood Mar 04 '14 at 18:35
  • I tried `@current_user.articles.includes(:comments).where(comments: {title: "My Title"})` but it still causes the mentioned problem: articles that no matches comments are not retrieved. – user502052 Mar 04 '14 at 18:45
1

Just to confirm: David Underwood is correct that find_by will make a database call. In fact, find_by is basically just a wrapper for where and take, which will indeed make a DB call.

An alternative way to accomplish what you're looking for is to simply treat the collection proxy as an array, by using the find method, like so:

@articles.each do |article|
    logger.debug article.comments.find {|comment| comment.title == "Hello A"}
end

Update:

I have to admit, this one was a bit of a doozy.

The way to accomplish this in the manner you're looking for is to add another has_many relationship that specifically contains the filtering condition you want, like so:

class Article < ActiveRecord::Base

    has_many :hello_A_comments, -> { where(title: "Hello A") }, class_name: "Comment"

    # rest of class
end

And then, you eager load using this new association, like so:

@articles = @current_user.articles.includes(:hello_A_comments)

This part is very important: You now access the association not through the original :comments association method, but through the new hello_A_comments method instead, like so:

@articles.first.hello_a_comments

Unfortunately, as you can see, this method is not very dynamic, and sadly I do not know how to allow for variable conditions in the association in the case of eager loading. This answer might be a good resource to look at, but in the case of eager loading, honestly I don't believe its possible. If that is an issue, you might be stuck with the array method I mentioned earlier.

Community
  • 1
  • 1
Paul Richter
  • 10,908
  • 10
  • 52
  • 85
  • By treating the collection as an array it works as expected, but there is a way to "filter" `title` directly when retrieving `articles` (with the `include` clause) and avoiding the problem I mentioned in the http://stackoverflow.com/questions/22179432/trouble-when-filtering-eager-loaded-data/22179951?noredirect=1#comment33665335_22179692 comment? – user502052 Mar 04 '14 at 18:32
  • 1
    `.find` will only return the first matching element. `.find_all` will return all matching elements. – David Underwood Mar 04 '14 at 18:37
  • @DavidUnderwood Yup, that's correct. I figured the OP wanted to find only one since they were using `find_by`, which is the same as `limit 1`, but yes if you need to find multiple, use the [`find_all`](http://ruby-doc.org/core-2.1.0/Enumerable.html#method-i-find_all) method instead. – Paul Richter Mar 04 '14 at 18:44
  • 1
    @user502052 Ah ok, I see what your issue is. I think I can figure out something, give me a moment... – Paul Richter Mar 04 '14 at 18:46
  • 1
    @Paul Richter - You are right. I would like to find only one. – user502052 Mar 04 '14 at 18:46
  • 1
    Even me had a similar issue but I could not solve it. – Backo Mar 04 '14 at 18:58
  • @user502052 Answer updated. Its not 100% perfect, but its about the best I think of presently. – Paul Richter Mar 04 '14 at 19:21
  • 1
    I will give it a try. Thanks, anyway. I appreciate your effort, really. – user502052 Mar 04 '14 at 19:30