7

I have a sequential scan occurring in my UsersController#create action.

SELECT ? AS one FROM "users" WHERE (LOWER("users"."username") = LOWER(?) AND "users"."id" != ?) LIMIT ?

Explain plan
1 Query planLimit (cost=0.00..3.35 rows=1 width=0)
2 Query plan-> Seq Scan on users (cost=0.00..3.35 rows=1 width=0)
3 Query planFilter: ?

I am fairly positive this arises from the following model validation:

validates :username, uniqueness: { case_sensitive: false }

Should I be creating an index against this express? And if so, what's the proper way to do this in Rails 4?

mu is too short
  • 426,620
  • 70
  • 833
  • 800
Josh Smith
  • 14,674
  • 18
  • 72
  • 118

1 Answers1

13

Yes, that validation would do that sort of query and that sort of query is going to do a table scan.

You actually have a couple problems here:

  • The validation is subject to race conditions because the logic isn't in the database where it belongs. The database should be responsible for all data integrity issues regardless of the usual Rails ideology.
  • Your validation triggers table scans and no one likes table scans.

You can solve both of those problems with one index. The first problem is solved by using a unique index inside the database. The second problem is solved by indexing the result of lower(username) rather than username.

AFAIK Rails still doesn't understand indexes on expressions so you'll have to do two things:

  1. Switch from schema.rb to structure.sql to keep Rails from forgetting about your index. In your config/application.rb you'll want to set:

    config.active_record.schema_format = :sql
    

    You'll also have to start using the db:structure:* rake tasks instead of the db:schema:* tasks. Once you've switched to structure.sql, you can delete db/schema.rb since it won't be updated or used anymore; you'll also want to start tracking db/structure.sql in revision control.

  2. Create the index by hand by writing a bit of SQL in a migration. This is easy:

    def up
      connection.execute(%q{
        create index idx_users_lower_username on users(lower(username))
      })
    end
    
    def down
      connection.execute(%q{
        drop index idx_users_lower_username
      })
    end
    

Of course this will leave you with PostgreSQL-specific things but that's nothing to worry about since ActiveRecord doesn't give you any useful portability anyway.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • I think this is implied, but just to verify, does this mean that I should delete `schema.rb`? – Josh Smith Dec 15 '14 at 21:37
  • 1
    `schema.rb` won't be used for anything once the `schema_format` is changed so yeah, delete it and start tracking `db/structure.sql` in your revision control instead. – mu is too short Dec 15 '14 at 21:40
  • Okay, thanks. I'm also seeing in my CI server: `schema.rb doesn't exist yet. Run rake db:migrate to create it, then try again. If you do not intend to use a database, you should instead alter config/application.rb to limit the frameworks that will be loaded.` This is because it's still trying to use `rake db:schema:load`, so I have to ensure I modify that everywhere I need to. – Josh Smith Dec 15 '14 at 21:46
  • Yeah, I really don't understand why they have to have two different sets of tasks for this, the tasks really should be smart enough to check the configuration setting but... – mu is too short Dec 15 '14 at 21:54
  • Should `User.where("lower(username) = ?", username.downcase).explain` still not show a sequential scan after this migration? I'm trying to figure out how I can verify that it worked. – Josh Smith Dec 15 '14 at 22:29
  • 2
    How big is your `users` table? Just because you have an index doesn't mean that it will get used. The query optimizer will guess if an index is worthwhile or not and the guessing depends on what the data in the table looks like, how big it is, ... If you only have 100 entries, using an index might be more expensive than checking them all. Query optimization is a bit of a black art. – mu is too short Dec 15 '14 at 22:38
  • 1
    Got it. As you were answering I came across [why postgres won't always use an index](http://robots.thoughtbot.com/why-postgres-wont-always-use-an-index) which says precisely the same thing. In development, yes, not many users. So it's going to do a full table scan. But I imagine that this is still better to have in production for a larger dataset. – Josh Smith Dec 15 '14 at 22:41
  • I think that index is a reasonable thing to have. The usual tradeoff is between the cost of maintaining the index versus the savings of using it. The index probably will get used when there are enough users; presumably the username doesn't change much so the index maintenance should be nearly free over time. – mu is too short Dec 15 '14 at 23:01
  • 1
    @JoshSmith make sure you run a `VACUUM ANALYZE users;` after you create the index. [explanation](https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT#Using_ANALYZE_to_optimize_PostgreSQL_queries) – mraaroncruz Sep 15 '15 at 07:50