There are several different, interrelated things going on here. Exactly how you carry out the changes depends on how you manage changes to your database structure. The most common way is to use Rails migrations, but your linked question suggests you're not doing that. So I'll speak mostly in SQL, and you can adapt that to your method.
Use a sargable WHERE clause
Your WHERE clause isn't sargable. That means it's written in a way that prevents the dbms from using an index. To create an index PostgreSQL can use here . . .
create index on "users" (lower("username") varchar_pattern_ops);
Now queries on lowercased usernames can use that index.
explain analyze
select *
from users
where lower(username) = lower('9LCDgRHk7kIXehk6LESDqHBJCt9wmA');
It might appear as if PostgreSQL must lowercase every username in the table, but its query planner is smart enough to see that the expression lower(username)
is itself indexed. PostgreSQL uses an index scan.
"Index Scan using users_lower_idx on users (cost=0.43..8.45 rows=1 width=35) (actual time=0.034..0.035 rows=1 loops=1)"
" Index Cond: (lower((username)::text) = 'b0sa9malg7yt1shssajrynqhiddm5d'::text)"
"Total runtime: 0.058 ms"
This table has a million rows of random-ish data; the query returns very, very quickly. It's just about equally fast with the additional condition on "id", but the LIMIT clause slows it down a lot. "Slows it down a lot" doesn't mean it's slow; it still returns in less than 0.1 ms.
Also, here the varchar_pattern_ops
lets queries that use the LIKE operator use the index.
explain analyze
select *
from users
where lower(username) like 'b%'
"Bitmap Heap Scan on users (cost=1075.12..9875.78 rows=30303 width=35) (actual time=10.217..91.030 rows=31785 loops=1)"
" Filter: (lower((username)::text) ~~ 'b%'::text)"
" -> Bitmap Index Scan on users_lower_idx (cost=0.00..1067.54 rows=31111 width=0) (actual time=8.648..8.648 rows=31785 loops=1)"
" Index Cond: ((lower((username)::text) ~>=~ 'b'::text) AND (lower((username)::text) ~<~ 'c'::text))"
"Total runtime: 93.541 ms"
Only 94 ms to select and return 30k rows from a million.
Queries on very small tables might use a sequential scan even though there's a usable index. I wouldn't worry about that if I were you.
Enforce uniqueness in the database
If you're expecting any bursts of traffic, you should enforce uniqueness in the database. I do this all the time, regardless of any expectations (guesses) about traffic.
The RailsGuides Active Record Validations includes this slightly misleading or confusing paragraph about the "uniqueness" helper.
This helper validates that the attribute's value is unique right
before the object gets saved. It does not create a uniqueness
constraint in the database, so it may happen that two different
database connections create two records with the same value for a
column that you intend to be unique. To avoid that, you must create a
unique index on both columns in your database. See the MySQL manual
for more details about multiple column indexes.
It clearly says that, in fact, it doesn't guarantee uniqueness. The misleading part is about creating a unique index on "both columns". If you want "username" to be unique, you need to declare a unique constraint on the column "username".
alter table "users"
add constraint constraint_name unique (username);
Case-sensitivity
In SQL databases, case-sensitivity is determined by collation. Collation is part of the SQL standards.
In PostgreSQL, you can set collation at the database level, at the column level, at the index level, and at the query level. Values come from the locales the operating system exposes at the time you create a new database cluster using initdb
.
On Linux systems, you probably have no case-insensitive collations. That's one reason we have to jump through rather more hoops than people who target SQL Server and Oracle.