0

What's the best way to handle a large result set with Rails and Postgres? I didn't have a problem until today, but now I'm trying to return a 124,000 record object of @network_hosts, which has effectively DoS'd my development server.

My activerecord orm isn't the prettiest, but I'm pretty sure cleaning it up isn't going to help in relation to performance.

@network_hosts = []
@host_count = 0
@company.locations.each do |l|
  if  l.grace_enabled == nil || l.grace_enabled == false
    l.network_hosts.each do |h|
      @host_count += 1
      @network_hosts.push(h)
      @network_hosts.sort! { |x,y| x.ip_address <=> y.ip_address }
      @network_hosts = @network_hosts.first(5)
     end
  end
end

In the end, I need to be able to return @network_hosts to the controller for processing into the view.

Is this something that Sidekiq would be able to help with, or is it going to be just as long? If Sidekiq is the path to take, how do I handle not having the @network_hosts object upon page load since the job is running asyncronously?

Godzilla74
  • 2,358
  • 1
  • 31
  • 67

2 Answers2

1

I believe you want to (1) get rid of all that looping (you've got a lot of queries going on) and (2) do your sorting with your AR query instead of in the array.

Perhaps something like:

NetworkHost.
  where(location: Location.where.not(grace_enabed: true).where(company: @company)).
  order(ip_address: :asc).
  tap do |network_hosts|
    @network_hosts = network_hosts.limit(5)
    @host_count = network_hosts.count
  end

Something like that ought to do it in a single DB query.

I had to make some assumptions about how your associations are set up and that you're looking for locations where grace_enabled isn't true (nil or false).

I haven't tested this, so it may well be buggy. But, I think the direction is correct.

jvillian
  • 19,953
  • 5
  • 31
  • 44
  • I should have mentioned that the issue appears as soon as I took out the limit from `@network_hosts = network_hosts.limit(5)` – Godzilla74 Jul 13 '17 at 00:27
  • According to [this answer](https://stackoverflow.com/questions/16555206/arrays-in-ruby-take-vs-limit-vs-first), limit is not an `array` method. I still think you want to get rid of all that looping (n+1?) and sorting of the `array`. – jvillian Jul 13 '17 at 00:32
  • You're correct. I was able to speed up the query and the server doesn't hang with this query `@network_hosts = NetworkHost.where(location_id: @company.locations.where.not(grace_enabled: true)).order(ip_address: :asc)` ... Now I just need to figure out how to cut down the query time more since it takes `380511ms` as of now. Thanks for putting me on the right track. My AR query skills need work! – Godzilla74 Jul 13 '17 at 00:37
  • Out of curiosity, in your console, how many queries are being done as a result of this code block? Also, you should be able to do `location:` instead of `location_id:` as long as `NetworkHost belongs_to :location` . – jvillian Jul 13 '17 at 00:40
  • It was tons... way too many to count by hand. Is there a way that I can get an easy count? – Godzilla74 Jul 13 '17 at 00:41
  • Is it still tons? That would be surprising. – jvillian Jul 13 '17 at 00:42
  • It's probably because I need to clean up some other helper methods that also call the method to get the hosts, so it's running it over a few times. – Godzilla74 Jul 13 '17 at 00:43
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/149061/discussion-between-jvillian-and-godzilla74). – jvillian Jul 13 '17 at 00:44
1

Something to remember, Rails won't execute any SQL queries until the result of the query is actually needed. (I'll be using User instead of NetworkHost so I can show you the console output as I go)

@users = User.where(first_name: 'Random');nil # No query run
=> nil
@users # query is now run because the results are needed (they are being output to the IRB window)
#  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."first_name" = $1 LIMIT $2  [["first_name", "Random"], ["LIMIT", 11]]
# => #<ActiveRecord::Relation [...]>
@users = User.where(first_name: 'Random') # query will be run because the results are needed for the output into the IRB window   
#  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."first_name" = $1 LIMIT $2  [["first_name", "Random"], ["LIMIT", 11]]
# => #<ActiveRecord::Relation [...]>

Why is this important? It allows you to store the query you want to run in the instance variable and not execute it until you get to a view where you can use some of the nice methods of ActiveRecord::Batches. In particular, if you have some view (or export function, etc.) where you are iterating the @network_hosts, you can use find_each.

# Controller
@users = User.where(first_name: 'Random') # No query run

# view
@users.find_each(batch_size: 1) do |user|
  puts "User's ID is #{user.id}"         
end
#  User Load (0.5ms)  SELECT  "users".* FROM "users" WHERE "users"."first_name" = $1 ORDER BY "users"."id" ASC LIMIT $2  [["first_name", "Random"], ["LIMIT", 1]]
#  User's ID is 1
#  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."first_name" = $1 AND ("users"."id" > 1) ORDER BY "users"."id" ASC LIMIT $2  [["first_name", "Random"], ["LIMIT", 1]]
#  User's ID is 2
#  User Load (0.3ms)  SELECT  "users".* FROM "users" WHERE "users"."first_name" = $1 AND ("users"."id" > 2) ORDER BY "users"."id" ASC LIMIT $2  [["first_name", "Random"], ["LIMIT", 1]]
# => nil

Your query is not executed until the view, where it will now load only 1,000 records (configurable) into memory at a time. Once it reaches the end of those 1,000 records, it will automatically run another query to fetch the next 1,000 records. So your memory is much more sane, at the cost of extra database queries (which are usually pretty quick)

Simple Lime
  • 10,790
  • 2
  • 17
  • 32
  • Does doing this keep the page from loading until all records have been batched in? – Godzilla74 Jul 13 '17 at 01:46
  • As far as I'm aware rails won't start sending any html until the view is finished rendering, but I admit, I've never looked into how/when it starts sending stuff to the client. In my experience though, when you have queries with a ton of records being returned, it's not usually the queries that need to be run taking a while, it's trying to load and store that many records in memory and batching can usually give you some better performance – Simple Lime Jul 13 '17 at 01:57
  • and with the configurable batch size you can reduce the number of queries that need to be run if you know your server can handle, say, 10,000 records at a time without slowing down – Simple Lime Jul 13 '17 at 01:59
  • instead of `@users` you would have `@network_hosts` and then do like `@network_hosts.find_each { |host| # render/act on the host }` – Simple Lime Jul 13 '17 at 03:02