2

I am trying to join data from disparate sources. The only common field to join is address. In table 1 , address has extra data (representing neighborhood) between street and state. Is there a way to join these tables using most similar address? I have 85,000 addresses, so a manual search using LIKE and wildcards will not work.

Table 1:
"239 Dudley St Dudley Square Roxbury MA 02119"
"539 Dudley St Dudley Square Roxbury MA 02119"

Table 2:
"239 Dudley St Roxbury MA 02119"
"539 Dudley St Roxbury MA 02119"

James Steele
  • 645
  • 1
  • 6
  • 22

1 Answers1

1

I have two suggestions:

1) "All words in the table 2 address are present in the table 1 address":

select *
from t1 join
  t2 on (string_to_array(t2.address,' ') <@ string_to_array(t1.address,' '));

2) "For each table 1 address find the most similar address from the table 2":

select distinct on(t1.address) *
from t1 cross join t2
order by t1.address, similarity(t1.address, t2.address) desc;
Abelisto
  • 14,826
  • 2
  • 33
  • 41