14

I've been trying to figure out how to remove multiple non-alphanumeric or non-numeric characters, or return only the numeric characters from a string. I've tried:

SELECT
regexp_extract('X789', '[0-9]', 0)
FROM
table_name

But it returns '7', not '789'.

I've also tried to remove non-numeric characters using NOT MATCH syntax ^((?!regexp).)*$:

SELECT
REGEXP_REPLACE('X789', '^((?![0-9]).)*$', '')
FROM
jav_test_ii

Can regexp_extract return multiple matches? What I'm really trying to do is clean my data to only contain numbers, or alphanumeric characters. This seems to help remove bad characters, but its not a range of characters like [0-9] is. regexp_replace(string, '�','')

EDIT: The query below was able to return '7789', which is exactly what I was looking for.

SELECT
regexp_replace("7X789", "[^0-9]+", "")
FROM
table_name
user1152532
  • 697
  • 3
  • 7
  • 15

1 Answers1

27

See also this hive regexp_extract weirdness

I think regex_extract will only return the group number stated in the 3rd parameter.

regex_extract seems to only work on a line and then quit.

I don't know about the replace counterpart.

It might work on non-alphanum data though if you fed it something like this

REGEXP_REPLACE(error_code, '[^a-zA-Z0-9]+', '')


Also, for extract, see the link above and you can change it to

regexp_extract('X789', '[0-9]+', 0) for multiple numbers.

or

regexp_extract('XYZ789', '[a-zA-Z]+', 0) for multiple alpha's.

Community
  • 1
  • 1
  • Thanks for the tip, which works for my example. It doesn't seem to work if you have '7X789', as it only returns '7'. Do you know if your solution can be extended to return 7789 in that case? – user1152532 Jan 28 '14 at 23:34
  • @user1152532 - Try the replace method. For example, `regexp_replace("7(-X78T9)RMM", "[^0-9]+", "")` should return `7789`. The regex contains a quantified (+) negative ([^]) class, means not a digit between 0-9. It will globally replace all non-digit characters. This is according to the docs. I can't test it. –  Jan 29 '14 at 00:30
  • That worked perfectly... Thank you @sln. It looks like I could use a refresher on REGEX - can you recommend any good sites/resources for getting up to speed? The google results for REGEXP Basics all seem to point to sites that look like their from the 90s. – user1152532 Jan 29 '14 at 01:25
  • Great, it worked! I'd try wikipedia first. Search for 'regular expressions', read some basics about nuance and flavors. Good Luck! –  Jan 29 '14 at 01:51