We have a table of intersection names like 'Main St / Broadway Ave'
and we are trying to match potentially messy user input (of the form (street1, street2)
) to these names. There's no guarantee the input would be in the same order as the street names.
We split the intersection names to a long format table in order to optimize doing two fuzzy distance comparisons, e.g.
+--------+----------------+
| int_id | street |
+--------+----------------+
| 1 | 'Broadway Ave' |
+--------+----------------+
| 1 | 'Main St' |
+--------+----------------+
And put a gist trigram index on the street column.
So then the query finds all int_ids that are close to one or the other street input and then does a GROUP BY
to find the one with the closest combined distances (I'll insert the query later). This works pretty well but we still need it to work faster. Is there something in PostgreSQL's full text search library that could do the trick?
Query example used in a function, with related explain https://explain.depesz.com/s/J9lj
SELECT intersections.int_id,
SUM(LEAST(
intersections.street <-> street1,
intersections.street <-> street2))
, intersections.int_id
FROM intersections
WHERE (street1 <% intersections.street
OR
street2 <% intersections.street
)
GROUP BY intersections.int_id
HAVING COUNT(DISTINCT TRIM(intersections.street)) > 1
ORDER BY AVG(
LEAST(
intersections.street <-> street1,
intersections.street <-> street2))