How to optimize fuzzy string matching pairs of phrases (intersection names) in PostgreSQL

Question

We have a table of intersection names like 'Main St / Broadway Ave' and we are trying to match potentially messy user input (of the form (street1, street2)) to these names. There's no guarantee the input would be in the same order as the street names.

We split the intersection names to a long format table in order to optimize doing two fuzzy distance comparisons, e.g.

+--------+----------------+
| int_id | street         |
+--------+----------------+
| 1      | 'Broadway Ave' |
+--------+----------------+
| 1      | 'Main St'      |
+--------+----------------+

And put a gist trigram index on the street column.

So then the query finds all int_ids that are close to one or the other street input and then does a GROUP BY to find the one with the closest combined distances (I'll insert the query later). This works pretty well but we still need it to work faster. Is there something in PostgreSQL's full text search library that could do the trick?

Query example used in a function, with related explain https://explain.depesz.com/s/J9lj

SELECT intersections.int_id, 
SUM(LEAST(
    intersections.street <-> street1,
    intersections.street <-> street2))
, intersections.int_id
FROM intersections

WHERE (street1 <% intersections.street
    OR 
    street2 <% intersections.street 
)
GROUP BY intersections.int_id
HAVING COUNT(DISTINCT TRIM(intersections.street)) > 1
ORDER BY AVG(
    LEAST(
    intersections.street <-> street1,
    intersections.street <-> street2))

Please show the queries, and the `EXPLAIN (ANALYZE, BUFFERS)` output for them. Turn track_io_timing on first if it is not already on. Edit your question to include this info as formatted text, not as images. If the plans are too large, you can put them on https://explain.depesz.com/ and give the links. — jjanes, Aug 08 '22 at 14:36
Updated. Actually the performance on a sample of 1000 records isn't as bad as I had thought (7s). — raphael, Aug 08 '22 at 15:55

score 0 · Answer 1 · answered Aug 08 '22 at 13:53

0

I know you are looking for a PostgreSQL solution, but this would be much easier (and faster) if you clone your data into Elastic Search and do the searches there. Elastic Search also gives you way more flexibility then a relational database could.

answered Aug 08 '22 at 13:53

Jojan

33
5

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 14 '22 at 19:05

How to optimize fuzzy string matching pairs of phrases (intersection names) in PostgreSQL

1 Answers1