Uniqueness case-sensitive false causes slow query

Question

I have the following validation:

validates :username, uniqueness: { case_sensitive: false }

Which causes the following query to be run painfully slow:

5,510 ms
SELECT ? AS one FROM "users" WHERE (LOWER("users"."username") = LOWER(?) AND "users"."id" != ?) LIMIT ?
Explain plan
1 Query plan    Limit (cost=0.03..4.03 rows=1 width=0)
2 Query plan    -> Index Scan using idx_users_lower_username on users (cost=0.03..4.03 rows=1 width=0)
3 Query plan    Index Cond: ?
4 Query plan    Filter: ?

The index was created in my structure.sql using CREATE INDEX idx_users_lower_username ON users USING btree (lower((username)::text)); See my question How to create index on LOWER("users"."username") in Rails (using postgres) for more on this.

This is using the index I set and still takes over 5 seconds? What's wrong here?

Can you clarify how you created the `idx_users_lower_username` index? — fivedigit, Dec 18 '14 at 08:35
`LOWER("users"."username")` in the query means it is downcasing every single record in the DB. An index helps you look things up, but it doesn't help you transform existing records as far as I know. — Henrik N, Dec 18 '14 at 09:00
@blelump Hm, yes, you're right: http://www.postgresql.org/docs/9.1/static/indexes-expressional.html " the index expressions are not recomputed during an indexed search, since they are already stored in the index." Thanks for that. — Henrik N, Dec 18 '14 at 09:41
@fivedigit I used `CREATE INDEX idx_users_lower_username ON users USING btree (lower((username)::text));` — Josh Smith, Dec 18 '14 at 16:37
@kyledecot maybe 80? But it's not doing a table scan (which it was previously), and appears to be using the index. — Josh Smith, Dec 18 '14 at 16:38

score 1 · Answer 1 · answered Dec 18 '14 at 22:06

There are several different, interrelated things going on here. Exactly how you carry out the changes depends on how you manage changes to your database structure. The most common way is to use Rails migrations, but your linked question suggests you're not doing that. So I'll speak mostly in SQL, and you can adapt that to your method.

Use a sargable WHERE clause

Your WHERE clause isn't sargable. That means it's written in a way that prevents the dbms from using an index. To create an index PostgreSQL can use here . . .

create index on "users" (lower("username") varchar_pattern_ops);

Now queries on lowercased usernames can use that index.

explain analyze
select * 
from users
where lower(username) = lower('9LCDgRHk7kIXehk6LESDqHBJCt9wmA');

It might appear as if PostgreSQL must lowercase every username in the table, but its query planner is smart enough to see that the expression lower(username) is itself indexed. PostgreSQL uses an index scan.

"Index Scan using users_lower_idx on users  (cost=0.43..8.45 rows=1 width=35) (actual time=0.034..0.035 rows=1 loops=1)"
"  Index Cond: (lower((username)::text) = 'b0sa9malg7yt1shssajrynqhiddm5d'::text)"
"Total runtime: 0.058 ms"

This table has a million rows of random-ish data; the query returns very, very quickly. It's just about equally fast with the additional condition on "id", but the LIMIT clause slows it down a lot. "Slows it down a lot" doesn't mean it's slow; it still returns in less than 0.1 ms.

Also, here the varchar_pattern_ops lets queries that use the LIKE operator use the index.

explain analyze
select *
from users
where lower(username) like 'b%'


"Bitmap Heap Scan on users  (cost=1075.12..9875.78 rows=30303 width=35) (actual time=10.217..91.030 rows=31785 loops=1)"
"  Filter: (lower((username)::text) ~~ 'b%'::text)"
"  ->  Bitmap Index Scan on users_lower_idx  (cost=0.00..1067.54 rows=31111 width=0) (actual time=8.648..8.648 rows=31785 loops=1)"
"        Index Cond: ((lower((username)::text) ~>=~ 'b'::text) AND (lower((username)::text) ~<~ 'c'::text))"
"Total runtime: 93.541 ms"

Only 94 ms to select and return 30k rows from a million.

Queries on very small tables might use a sequential scan even though there's a usable index. I wouldn't worry about that if I were you.

Enforce uniqueness in the database

If you're expecting any bursts of traffic, you should enforce uniqueness in the database. I do this all the time, regardless of any expectations (guesses) about traffic.

The RailsGuides Active Record Validations includes this slightly misleading or confusing paragraph about the "uniqueness" helper.

This helper validates that the attribute's value is unique right before the object gets saved. It does not create a uniqueness constraint in the database, so it may happen that two different database connections create two records with the same value for a column that you intend to be unique. To avoid that, you must create a unique index on both columns in your database. See the MySQL manual for more details about multiple column indexes.

It clearly says that, in fact, it doesn't guarantee uniqueness. The misleading part is about creating a unique index on "both columns". If you want "username" to be unique, you need to declare a unique constraint on the column "username".

alter table "users"
add constraint constraint_name unique (username);

Case-sensitivity

In SQL databases, case-sensitivity is determined by collation. Collation is part of the SQL standards.

In PostgreSQL, you can set collation at the database level, at the column level, at the index level, and at the query level. Values come from the locales the operating system exposes at the time you create a new database cluster using initdb.

On Linux systems, you probably have no case-insensitive collations. That's one reason we have to jump through rather more hoops than people who target SQL Server and Oracle.

score 0 · Answer 2 · answered Dec 18 '14 at 20:16

try to run the query in psql using explain analyze, so you make sure postgres is running fine, because apparently the index and query are right.

if it is fast in psql, then there is a problem with your rails code.

this query against a 3k records table gave this result (in my local dev machine):

app=# explain analyze SELECT id AS one FROM "users" WHERE (LOWER(email) = LOWER('marcus@marcus.marcus') AND "users"."id" != 2000);
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on users  (cost=4.43..58.06 rows=19 width=4) (actual time=0.101..0.101 rows=0 loops=1)
   Recheck Cond: (lower((email)::text) = 'marcus@marcus.marcus'::text)
   Filter: (id <> 2000)
   ->  Bitmap Index Scan on users_lower_idx  (cost=0.00..4.43 rows=19 width=0) (actual time=0.097..0.097 rows=0 loops=1)
         Index Cond: (lower((email)::text) = 'marcus@marcus.marcus'::text)
 Total runtime: 0.144 ms
(6 rows)

Uniqueness case-sensitive false causes slow query

2 Answers2