-1

Does anyone know how to split my address columns. I would like to divide it into three new columns, seperated by it's commas. For example 1808 FOX CHASE DR, GOODLETTSVILLE, TN is divided into

  • 1808 FOX CHASE DR
  • GOODLETTSVILLE
  • TN

So far I've tried

SELECT SUBSTRING(OwnerAddress, 1, INSTR(OwnerAddress, ',')-1) as col1,
SUBSTRING(OwnerAddress, INSTR(OwnerAddress, ',')+1) as col2  
FROM housing_data;

Although I cannot create the third column with the state abbreviation.

Thanks for any input

  • Address data is notoriously difficult to algorithmically parse if you haven’t done so upfront. There are way too many edge cases to do this in SQL - consider using a trained ML algorithm like `libpostal` to perform this for you - it’ll perform at a much higher accuracy than any other method you may be considering. Was the data controlled on the way in? Can you guarantee 100% this data is always structured in the same manner? Please read [Falsehoods programmers believe about addresses](https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/) to scratch the surface… – esqew Nov 22 '21 at 23:47
  • For starters, you should look to rectify your data model to capture these distinct data points up front to make your life a hell of a lot easier. – esqew Nov 22 '21 at 23:49
  • @esqew thank you for you suggestions – Alastair Thomson Nov 23 '21 at 11:28

1 Answers1

2

It's generally much cleaner to do any sort of string processing in the target library. That said, if you must, you can make liberal use of SUBSTRING and INSTR to find each comma and split the strings:

SELECT 
    SUBSTRING(OwnerAddress, 1, INSTR(OwnerAddress, ',') - 1) as col1,
    SUBSTRING(SUBSTRING(OwnerAddress, INSTR(OwnerAddress, ',') + 1), 1, INSTR(SUBSTRING(OwnerAddress, INSTR(OwnerAddress, ',') + 1), ',') - 1) as col2,
    SUBSTRING(SUBSTRING(OwnerAddress, INSTR(OwnerAddress, ',') + 1), INSTR(SUBSTRING(OwnerAddress, INSTR(OwnerAddress, ',') + 1), ',') + 1) as col3
FROM housing_data;
Anon Coward
  • 9,784
  • 3
  • 26
  • 37
  • This is a rather shortsighted suggestion for parsing unstructured address data - what if the address contains a second street address line? For example, something like `1808 FOX CHASE DR, APARTMENT 101, GOODLETTSVILLE, TN`? What if the address contains an addressee, i.e. `BUSINESS ABC, 1808 FOX CHASE DR, GOODLETTSVILLE, TN`? Your proposal fails to account for many of the classic edge cases documented on [Falsehoods programmers believe about addresses](https://www.mjt.me.uk/posts/falsehoods-programmers-believe-about-addresses/) – esqew Nov 22 '21 at 23:47