I have a table with approximately 7 million records. The table has a first_name and last_name column which I want to search on using the levenshtein() distance function.
select levenshtein('JOHN', first_name) as fn_distance,
levenshtein('DOE', last_name) as ln_distance,
id,
first_name as "firstName",
last_name as "lastName"
from person
where first_name is not null
and last_name is not null
and levenshtein('JOHN', first_name) <= 2
and levenshtein('DOE', last_name) <= 2
order by 1, 2
limit 50;
The above search is slow (4 - 5 secs), what can I do to improve performance? Should a create indexes on the two columns, or something else?
After I added indexes below:
create index first_name_idx on person using gin (first_name gin_trgm_ops);
create index last_name_idx on person using gin(last_name gin_trgm_ops);
The query now takes ~11 secs. :(
New query:
select similarity('JOHN', first_name) as fnsimilarity,
similarity('DOW', last_name) as lnsimilarity,
first_name as "firstName",
last_name as "lastName",
npi
from person
where first_name is not null
and last_name is not null
and similarity('JOHN', first_name) >= 0.2
and similarity('DOW', last_name) >= 0.2
order by 1 desc, 2 desc, npi
limit 50;