-2

I have table in sql like below:

PHONE_NUMBER
--------
ppa:+77845678021@abc.gbc443.gbc356.7kffooopppp.deu  
tel:+77695002395    
....

As you can see I have in COL1 phone numbers with a lot of "rubbish" and I need to convert values in that column to have something like below (only phone number):

PHONE_NUMBER
--------
77845678021 
77695002395 
....

How can I do that in SQL ?

General Grievance
  • 4,555
  • 31
  • 31
  • 45
dingaro
  • 2,156
  • 9
  • 29

1 Answers1

0

You can use regexp_substr. Something like this:

select regexp_substr(PHONE_NUMBER, '[0-9]{11,14}') PHONE_NUMBER
  from my_table

Demo for MySQL can be seen here.

This will extract first block of 11 to 14 digits. Since you haven't provided any specification on phone format, I'm guessing here.

EDIT: For Hive regexp_extract should be used instead:

select regexp_extract(PHONE_NUMBER, '[0-9]{11,14}') PHONE_NUMBER
  from my_table
markalex
  • 8,623
  • 2
  • 7
  • 32
  • markalex - I have ERROR like that using your code: KeyError: '11,14' – dingaro Apr 07 '23 at 10:00
  • It's strange, because as I can see Hive doesn't even have `regexp_substr`, it has `regexp_extract` instead. @dingaro – markalex Apr 07 '23 at 10:04
  • i changed to regexp_extract and I still have ERROR: KeyError: '11,14' – dingaro Apr 07 '23 at 10:12
  • @dingaro i am reading online that the regex looks very different between different versions of HiveQL, so I think that is impossible to answer at the moment. Unless people make a code for each possible version. – Johan Abdullah Holm Apr 07 '23 at 10:21
  • Johan, but do you have some example idea what can work ? – dingaro Apr 07 '23 at 10:22
  • @dingaro could make some examples, but since i dont know the version it could potentially take hours instead of minutes and a lot of back and forth. I think there is better use of my time. look at the documentation on regex it changes depending on the version of hive – Johan Abdullah Holm Apr 07 '23 at 10:25