How can I use ~ to fuzzy match two fields of a table?

Question

I'm trying to perform a join over two tables that contain info about the same companies, but sometimes the companies are stored with slightly different names (e.g. table 1: Company X -> Table 2: Company X and Friends). My idea was to full join each row of table 1 with each row of table 2 and then using ~ to filter (e.g. where name1 ~ name2 or name2 ~ name1), but that's not possible.

Do someone has a solution to this matching problem? Thanks!

score 4 · Answer 1 · answered Sep 03 '19 at 14:54

You can use the "<->" operator or the '%' operator provided by pg_trgm to do fuzzy string matching.

You have to make decisions like what similarity cutoff you are willing to use for '%' (return no matches if the best match is below that) and if you want to return only the top match (lateral joins are good for this) or possible more than one.

It would likely be best to use this to clean your data, rather than incorporating it into a production app directly.

You might also need to strip out nuisance words. "Company X and friends" will be more similar to "Company Y and friends" than it will be to "Company X limited".

SELECT t1.*, t2.*, 1- (t1.name <-> t2.name) as similarity
FROM table1 t1
INNER JOIN table2 t2
    ON t1.name % t2.name
order by t1.name <-> t2.name;

score 1 · Accepted Answer · answered Sep 03 '19 at 13:12

You could join on the condition that either name field be a substring of the other's name field:

SELECT t1.*, t2.*
FROM table1 t1
INNER JOIN table2 t2
    ON t1.name LIKE '%' || t2.name || '%' OR
       t2.name LIKE '%' || t1.name || '%';

This approach does not even require regex. We could use regex here, if we wanted to ensure that one table's name only appears as a substring of the other's name and is also a word. But, maybe you don't even need to do this.

score 1 · Answer 3 · answered Sep 03 '19 at 14:48

1

You could

CREATE EXTENSION pg_trgm;

and use the similarity operator it provides in the join condition:

t1 JOIN t2 ON t1.name % t2.name

answered Sep 03 '19 at 14:48

Laurenz Albe

209,280
17
206
263

How can I use ~ to fuzzy match two fields of a table?

3 Answers3