0

We have a varchar2 column called House_Number, which has all different kind of data entries as shown below. And all I need is the numbers from left to the first occurrence of a non-numeric like space, special characters and ascii characters.

Below are some sample house_numbers:

217 3RDFL, 2173RDFL, 500D, 527 # 2ND, 527# 2ND, 5422NDFL, 5422, 30# D2,
1250 2ND, 12502ND, 217 3RDFL, 2173RDFL, 5241R, 3 2R, 32R, 5092R, 24 # 2R,
24# 2R, 129 B-16, 129# B17, 129B-16, 16 # 2B, 4229B, 539# APT 3, 563 # A5,
2162A, 934-A, 109-A, 511-A, 339-REA, 339REAR

And for the above house_numbers, the expected corresponding outputs is as follows:

217, 217, 500, 527, 527, 542, 542, 30,1250,1250, 217, 217, 5241, 
3, 32, 5092, 24, 24, 129, 129, 129, 16, 4229, 539, 563, 2162, 
934, 109, 511, 339, 339

I tried using regexp_substr, but i could not get a logic which will work for all the different data inputs as shown above.

Would really appreciate if someone can help.

Ferdinand Gaspar
  • 2,043
  • 1
  • 8
  • 17
Philip
  • 1
  • 1
  • 1
    Post (add to your question using [Edit]) your best attempt so far. – PM 77-1 Sep 19 '17 at 20:34
  • I'd also suggest trying `patindex`. It's a pretty neat function that also uses RegEx. Edit: This is assuming SQL Server. – phroureo Sep 19 '17 at 20:35
  • Is it always the first 3 numbers? – Ted at ORCL.Pro Sep 19 '17 at 20:38
  • 1
    Partial duplicate of: https://stackoverflow.com/questions/11333078/sql-take-just-the-numeric-values-from-a-varchar (note the first answer with patindex, not the terrifying answer with all those replace()s) – RToyo Sep 19 '17 at 20:40
  • 1
    sorry.....i did not mention in the question.... – Philip Sep 19 '17 at 20:42
  • 1
    this is oracle 12c and it is not the first 3 numbers.... – Philip Sep 19 '17 at 20:42
  • 1
    we want the numbers starting from left until the first occurrence of a non-numeric which is white space, null, special characters and ascii characters – Philip Sep 19 '17 at 20:43
  • 1
    Address parsing is a hard problem. The USPS maintains a maintains a master list of all mailing addresses in the US and guards that data better than equifax. You might want to consider using a CASS service such as https://smartystreets.com/ instead of implementing an incomplete process. – WombatPM Sep 19 '17 at 20:43
  • 1
    It is not "until the first occurrence of a non-numeric" - in your sample data, you have 2173RDFL and you extract 217, not 2173. So it is not as simple as "until the first occurrence of a non-numeric." Since you have this exception you must state ALL the exceptions to that rule, or you will get a solution just as flawed as your requirement. –  Sep 19 '17 at 21:19
  • Regular expresions me thinks @Philip – Ted at ORCL.Pro Sep 19 '17 at 21:37
  • 1
    To explain further: If the input is `3739THFL` How do you know if you must extract 373, and `9THFL` means ninth floor? How do you know the number isn't 37, leaving `39THFL` to mean the 39th floor? SQL, software, computers in general can't answer such questions; they must be answered by you IN THE REQUIREMENT. –  Sep 19 '17 at 22:43
  • Thanks everyone for the response.....yes the challenge is when 3739THFL, how do we know if the correct number is 37 or 373 assuming this is either 9TH of 39TH floor. – Philip Sep 21 '17 at 13:41

2 Answers2

2

Using regexp_substr():

regexp_substr(house_number, '^[0-9]*')

That matches any number ([0-9]) occuring at the beginning of the word (^) repeated any number of times (*)

JNevill
  • 46,980
  • 4
  • 38
  • 63
  • i dont have the perfect solution but below got me the closest for now.... – Philip Sep 21 '17 at 13:44
  • select house_number, REGEXP_SUBSTR(replace(replace(replace(replace(replace(replace(house_number,'1/2',''),'-',''), '+', ''), '.', ''), '#', ''), '?', ''),'^\d+') AS "House_Number" from order by 1; – Philip Sep 21 '17 at 13:45
0

Here is one solution that I just built you can adjust/tweak: SQL:

 DECLARE @HOUSE_NUMBER_DATA TABLE
    (
        HOUSE_NUMBER VARCHAR(25)
    )

INSERT INTO @HOUSE_NUMBER_DATA
SELECT '217 3RDFL '  UNION ALL
SELECT '2173RDFL  '  UNION ALL
SELECT '500D      '  UNION ALL
SELECT '527 # 2ND '  UNION ALL
SELECT '527# 2ND  '  UNION ALL
SELECT '5422NDFL  '  UNION ALL
SELECT '30# D2    '  UNION ALL
SELECT '1250 2ND  '  UNION ALL
SELECT '12502ND   '  UNION ALL
SELECT '217 3RDFL '  UNION ALL
SELECT '2173RDFL  '  UNION ALL
SELECT '5241R     '  UNION ALL
SELECT '3 2R      '  UNION ALL
SELECT '32R       '  UNION ALL
SELECT '5092R     '  UNION ALL
SELECT '24 # 2R   '  UNION ALL
SELECT '24# 2R    '  UNION ALL
SELECT '129 B-16  '  UNION ALL
SELECT '129# B17  '  UNION ALL
SELECT '129B-16   '  UNION ALL
SELECT '16 # 2B   '  UNION ALL
SELECT '4229B     '  UNION ALL
SELECT '539# APT 3'  UNION ALL
SELECT '563 # A5  '  UNION ALL
SELECT '2162A     '  UNION ALL
SELECT '934-A     '  UNION ALL
SELECT '109-A     '  UNION ALL
SELECT '511-A     '  UNION ALL
SELECT '339-REA   '  UNION ALL
SELECT '339REAR   '  

SELECT * FROM @HOUSE_NUMBER_DATA

SELECT SUBSTRING(HOUSE_NUMBER,0, PATINDEX('%[^0-9]%', HOUSE_NUMBER)) ONLY_FIRST_NUMBERS_NO_SPACES
    ,CASE --It will tough to account for 22nd floor vs 2nd floor with no spaces
        WHEN SUBSTRING(HOUSE_NUMBER, PATINDEX('%[^0-9]%',HOUSE_NUMBER),2) = 'RD' THEN SUBSTRING(HOUSE_NUMBER,0, PATINDEX('%[^0-9]%', HOUSE_NUMBER) -1)
        WHEN SUBSTRING(HOUSE_NUMBER, PATINDEX('%[^0-9]%',HOUSE_NUMBER),2) = 'R' THEN SUBSTRING(HOUSE_NUMBER,0, PATINDEX('%[^0-9]%', HOUSE_NUMBER) -1)
        WHEN SUBSTRING(HOUSE_NUMBER, PATINDEX('%[^0-9]%',HOUSE_NUMBER),2) = 'ND' THEN SUBSTRING(HOUSE_NUMBER,0, PATINDEX('%[^0-9]%', HOUSE_NUMBER) -1)
        ELSE SUBSTRING(HOUSE_NUMBER,0, PATINDEX('%[^0-9]%', HOUSE_NUMBER))
    END TRYING_TO_IGNORE_RD
    ,HOUSE_NUMBER
FROM @HOUSE_NUMBER_DATA 

Not sure how to convert that to ORACLE though. (took out previous ORACLE section as the code doesn't seem to be correct, don't work with ORACLE)

Michael Edmison
  • 663
  • 6
  • 14
  • Thanks for the sample code and the test table you created etc. But this is for Oracle 12C and i believe we need to replace SUBSTRING with substr and replace PATINDEX with regexp_like. – Philip Sep 21 '17 at 13:47
  • I didn't see the Oracle tag on there, only the SQL. Sorry about that. Added a converted section under it from converter, so may or may not work. – Michael Edmison Sep 21 '17 at 16:30
  • @Philip can you confirm the Oracle section is correct snytex, it should get you the same result as the SQL part which I confirmed worked as you needed. Let me know when you can... – Michael Edmison Sep 22 '17 at 12:24
  • ....sorry i did not mention that this is for Oracle ...but only later added as a comment.... Thanks for the providing the Oracle syntax....below are some of the sample data...527 # 2ND 527# 2ND 5422NDFL 1250 2ND 1250 2ND 12502ND 12502ND 16682NDFL 4226 2NDFL 42262NDFL ...when i run your oracle script for the above sample data...i am getting NULL value as result....instead of capturing the numbers..... – Philip Sep 22 '17 at 17:28