0

What I want to do;

I have two tables with two address columns , both stored as text I want to create a view returning the matching rows.

What I've tried;

I've created and index on both columns and tables as below;

CREATE INDEX idx_table1_fulladdress ON table1 (LOWER(fulladdress_ppd));

Then run the following;

CREATE OR REPLACE VIEW view_adresscheck AS
SELECT 
--from table1
    table1.postcode,
    table1.fulladdress_ppd,
--from table2
    table2.epc_postcode,
    table2.fulladdress_epc 
FROM
    table1,
    table2
WHERE 
    table1.postcode = table2.epc_postcode
AND 
    table2.fulladdress_epc = table1.fulladdress_ppd ::text;

What hasn't worked The above returned fewer records than I know to be there. On inspection this is because the address format is not consistent between the two tables ie.

table1.fulladdress_ppd = Flat 2d The building the street
table2.fulladdress_epc = Flat 2/d The building the street, the town

The address isn't consistently formatted within the table either ie in table not all addresses include town so I can't use regex or trim to bulk clean.

I've then seen the fuzzystrmatch module in postgres and this sounds like it might resolve my problem.

Question Which of Soundex, Levenshtein, Metaphone is most appropriate. Most records are in English by some place names are Gaelic running on 9.6.

Community
  • 1
  • 1
mapping dom
  • 1,737
  • 4
  • 27
  • 50
  • Please try not to spread a similar issue over multiple questions. referencing your previous question. https://stackoverflow.com/questions/44587680/postgres-db-performance-for-split-vs-concatenate-when-matching – VynlJunkie Jun 17 '17 at 21:09

1 Answers1

0

talking from experience of matching address from different sources. What you could do is index each address. Regardless of formatting the above address would return the same number. You then match on these indexes.

eg in the UK you have what are called UDPRN numbers for each postal address in the country.

VynlJunkie
  • 1,953
  • 22
  • 26
  • That is very interesting, these are both UK address sets but don't include UPRN. One is land registry data the other energy performance data. I'm a research student in the built environment so not a developer or programmer by back ground. So what i should try is turning the string to a number using and index then match on that number? I was reading that an issue with this is that ASCII gives `a` and `A` different values, would that not cause me problems? – mapping dom Jun 18 '17 at 10:05
  • look into udprn and PAF. These will index your addresses for you. They will take into account difference in format and layout etc and return a matching key. Matching addresses is a pain and this is a pretty good way. Costs are under £100 for some options depending on who you buy from. – VynlJunkie Jun 18 '17 at 19:25
  • Thanks for the tip, I need the whole of England & Wales . I have OS AddressBase Plus set but problem i run into is that Land registry reocrd addresses in a unique way which differs from PAF and they don't attach the UPRN – mapping dom Jun 18 '17 at 21:11
  • @mappingdom I am stuck in the same problem than you, did you manage to find a solution? – gcj Jul 13 '21 at 06:26
  • @jr.gcj not a simple one, we wrote a bespoke bit of code which works very well – mapping dom Jul 24 '21 at 19:53