1

I have the string '1_A_B_C_D_E_1_2_3_4_5' and I am trying to extract the data 'A_B_C_D_E'. I am trying to remove the _1_2_3_4_5 & the 1_ portion from the string. Which is essentially the numeric portion in the string. any special characters after the last alphabet must also be removed. In this example the _ after the character E must also not be present.

and the Query I am trying is as below

SELECT 
REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[^0-9]+',1,1) 
from dual

The Data I get from the above query is as below: -

_A_B_C_D_E_

I am trying to figure a way to remove the underscore towards the end. Any other way to approach this?

Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124
NottyHead
  • 181
  • 4
  • 18
  • Please state more specifically what you must keep and what you must remove from the string. Is the part you must keep always at the beginning, or can it also be in the middle? Is the pattern letter-underscore-letter-underscore (etc) and then same with digits, and you must just keep the letters part? Or what? Don't ask us to guess! –  Feb 28 '18 at 03:41
  • So, are the "letters" always first and the "digits" always last? –  Feb 28 '18 at 03:51
  • You must clarify your question with some sample strings with varying combinations of string available in you data and the expected output for each. It is hard to assume things while answering. – Kaushik Nayak Feb 28 '18 at 04:28

2 Answers2

2

Assuming the "letters" come first and then the "digits", you could do something like this:

select regexp_substr('A_B_C_D_E_1_2_3_4_5','.*[A-Z]') from dual;

This will pull all the characters from the beginning of the string, up to the last upper-case letter in the string (.* is greedy, it will extend as far as possible while still allowing for one more upper-case letter to complete the match).

  • `select regexp_substr('1_A_B_C_D_E_1_2_3_4_5','.*[A-Z]') from dual;` returns `1_A_B_C_D_E` which is not good. I voted up your answer as it works perfectly on the string used in the question. – cdaiga Feb 28 '18 at 04:09
  • 1
    @cdaiga - see my comments under the original question. I asked the OP for clarification, he amended the question but it is still not clear. In my answer above, read the first sentence - I stated my assumption explicitly. –  Feb 28 '18 at 04:16
1

I have the string '1_A_B_C_D_E_1_2_3_4_5' and I am trying to extract the data 'A_B_C_D_E'

Use REGEXP_REPLACE:

SQL> SELECT trim(BOTH '_' FROM
  2         (REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[0-9]+', ''))) str
  3  FROM dual;

STR
---------
A_B_C_D_E

How it works:

  1. REGEXP_REPLACE will replace all numeric occurrences '[0-9]+' from the string. Alternatively, you could also use POSIX character class '[^[:digit:]]+'
  2. TRIM BOTH '_' will remove any leading and lagging _ from the string.

Also using REGEXP_SUBSTR:

SELECT trim(BOTH '_' FROM 
       (REGEXP_SUBSTR('1_A_B_C_D_E_1_2_3_4_5','[^0-9]+'))) str 
FROM dual;
STR
---------
A_B_C_D_E
Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124