0

Can default_scope when used to not order records by ID significantly slow down a Rails application?

For example, I have a Rails (currently 3.1) app using PostgreSQL where nearly every Model has a default_scope ordering records by their name:

default_scope order('users.name')

Right now because the default_scope's order records by name rather by ID, I am worried I might be incurring a significant performance penalty when normal queries are run. For example with:

User.find(5563401)

or

@User.where('created_at = ?', 2.weeks.ago)

or

User.some_scope_sorted_best_by_id.all

In the above examples, what performance penalty might I incur by having a default_scope by name on my Model? Should I be concerned about this default_scope affecting application performance?

Kelsey Hannan
  • 2,857
  • 2
  • 30
  • 46
  • 1
    It will certainly slow things down if you do not have indexes defined on the `name` columns. – infused Apr 09 '15 at 02:56
  • Never use default scopes. Period. They are a terrible feature. You should be aware of how your data is ordered (or unordered) each time you query it. – user229044 Apr 09 '15 at 03:33

1 Answers1

4

Your question is missing the point. The default scope itself is just a few microseconds of Ruby execution to cause an order by clause to be added to every SQL statement sent to PostgreSQL.

So your question is really asking about the performance difference between unordered queries and ordered ones.

Postgresql documentation is pretty explicit. Ordered queries on unindexed fields are much slower than unordered because (no surprise), PostgreSQL must sort the results before returning them, first creating temporary table or index to contain the result. This could easily be a factor of 4 in query time, possibly much more.

If you introduce an index just to achieve quick ordering, you are still paying to maintain the index on every insert and update. And unless it's the primary index, sorted access still involves random seeks, which may actually be slower than creating a temporary table. This also is discussed in the Postgres docs.

In a nutshell, NEVER add an order clause to an SQL query that doesn't need it (unless you enjoy waiting for your database).

NB: I doubt a simple find() will have order by attached because it must return exactly one result. You can verify this very quickly by starting rails console, issuing a find, and watching the generated SQL scroll by. However, the where and all definitely will be ordered and consequently definitely be slower than needed.

Gene
  • 46,253
  • 4
  • 58
  • 96
  • Thanks for your answer. Doing what you suggested (Rails 3.1) for `find()` I can confirm that the the order query is still sent to Postgres: User.find(5) --> ```SELECT "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY users.name LIMIT 1 [["id", 5]]``` Do you think having the order query still here for a single record would also affect Postgres performance? – Kelsey Hannan Apr 09 '15 at 04:40
  • To answer my follow up: even a single `find()` with `default_scope` ordered by name appears to be 2/3rds slower than unscoped: `10_000.times{User.find(5)}` took 5.25 seconds while `10_000.times{User.unscoped.find(5)}` took 3.84 seconds. – Kelsey Hannan Apr 09 '15 at 05:07
  • 1
    Ordering on a field without an index can be slower, but there is no hard rule that says it's always (much) slower. There is not even a guarantee that the query planner does use the index when you have an index and order on this field: It all depends on the data and the query you execute. – Frank Heikens Apr 09 '15 at 05:46
  • 1
    @Kelseydh You have the right idea. Benchmark it as realistically as possible. I think you'd really see the difference when querying with a cold cache for a result and that is too big to fit cache. In that case you'll need a disk-to-disk copy and external sort. After that the cache is cold again wrt the sorted result, so you'll need another read. How big the result would have to be to cause all this depends entirely on your PG server config. Disk-to-disk copy speed can very tremendously depending on whether the tables are on different spindles and available buffer space. Yada yada... – Gene Apr 10 '15 at 02:23