how to use regexp_substr

Question

i have call number :

TX353 G35 1992
Ref QP141 B151 R4 1956
RM216 M285 K5 1996
T385 C22 1960
Ths LB500 M200 A5 1998

i want to result:

TX353 G35 1992          =>TX
Ref QP141 B151 R4 1956  =>QP
RM216 M285 K5 1996      =>RM
T385 C22 1960           =>T
Ths LB500 M200 A5 1998  =>LB

i use:

SELECT REGEXP_SUBSTR(callnumber,'[A-Z]+')

result not correct TX R RM T T

I'm not sure what the required algorithm is. You want always only the first uppercase letters from the code, and that is usually 2 characters but sometimes one? — Non Plus Ultra, Mar 15 '17 at 09:43
Please specify (in words rather than through examples) what your requirement is for matching the sub-string. There are 3 answers which have all interpreted your requirements slightly differently and it is unclear what it is you are trying to achieve. — MT0, Mar 15 '17 at 10:16
**True or false?** (This will answer the questions asked in the other two comments.) In each input string, find the first substring of the form: space (or beginning of the string), followed by one or more upper-case letters, followed by one or more digits, followed by space (or the end of the string). From this substring, return just the upper-case letters. Return `NULL` if no such substring exists in the input string. — , Mar 15 '17 at 12:53

MT0 · Answer 1 · 2017-03-15T10:03:11.703

2

You appear to want the first upper-case letters that are followed by some digits:

Oracle Setup:

CREATE TABLE your_table ( your_column ) AS
  SELECT 'TX353 G35 1992'         FROM DUAL UNION ALL
  SELECT 'Ref QP141 B151 R4 1956' FROM DUAL UNION ALL
  SELECT 'RM216 M285 K5 1996'     FROM DUAL UNION ALL
  SELECT 'T385 C22 1960'          FROM DUAL UNION ALL
  SELECT 'Ths LB500 M200 A5 1998' FROM DUAL UNION ALL
  SELECT 'Ref A123 B456 C7 2000'  FROM DUAL;

Query:

SELECT REGEXP_SUBSTR(
         your_column,
         '([A-Z]+)\d',
         1,     -- Start at the first character
         1,     -- Get the first match
         NULL,  -- Case sensitive
         1      -- Return the first capture group
       ) As match
FROM   your_table

Output:

MATCH
-----
TX
QP
RM
T
LB
A

edited Mar 15 '17 at 10:03

answered Mar 15 '17 at 09:56

MT0

143,790
11
59
117

Cool one! if it is just the digits, it works.. If a space of some other special character? – Maheswaran Ravisankar Mar 15 '17 at 10:00
@MaheswaranRavisankar Then the OP can add those to the character that needs to be matched after the initial upper-case letters. However, the OP's data does not indicate this case and it would be up to them to identify what the actual requirements are. – MT0 Mar 15 '17 at 10:06

Maheswaran Ravisankar · Accepted Answer · 2017-03-15T10:07:42.823

0

Try this. To look for two consecutive Caps. I used NVL() to attempt for taking the single occurrence of Capital letter.

Credits to MT0 for that

[A-Z]{2}

Example:

with my_data(str) as
(
  select 'TX353 G35 1992' from dual
  union all
  select 'Ref QP141 B151 R4 1956' from dual
  union all
  select 'RM216 M285 K5 1996' from dual
  union all
  select 'T385 C22 1960' from dual
  union all
  select 'Ths LB500 M200 A5 1998' from dual

)
  select str,NVL(regexp_substr(str,'[A-Z]{2,}'),regexp_substr(str,'([A-Z]+)\d',1,1,NULL,1)) from my_data;

Output :

TX353 G35 1992          TX
Ref QP141 B151 R4 1956  QP
RM216 M285 K5 1996      RM
T385 C22 1960           T
Ths LB500 M200 A5 1998  LB

EDIT:

If you need to extract the the full sequence of CAPS.

You need this, [A-Z]{2,}

edited Mar 15 '17 at 10:07

answered Mar 15 '17 at 09:45

Maheswaran Ravisankar

17,652
6
47
69

If the input is `Ref A123 B456 C7 2000` then the output will be `R` not `A` which appears to be incorrect. – MT0 Mar 15 '17 at 10:01
@MT0, you're right.!!! Also How about `Ths L500 MB200 A5 1998`.. It is supposed to be MB I guess, not `L` , So it has to be conditional.?? Anyways, without the requirement, it is just a guessing game. – Maheswaran Ravisankar Mar 15 '17 at 10:08
I would disagree (since the OP expects `T` for `T385 C22 1960`) and say that the output should be `L` for `Ths L500 MB200 A5 1998` (since it is the first string with alphabetic characters followed by numeric characters) but this just highlights the issue with the OP's question that the requirements are not specified well enough. – MT0 Mar 15 '17 at 10:12

score 0 · Answer 3 · answered Mar 15 '17 at 09:46

0

remove "Ref" and "Ths" from the call number and your code works

answered Mar 15 '17 at 09:46

M Zippka

93
13

Boneist · Answer 4 · 2017-03-15T10:08:09.890

Here's one way of doing it:

WITH sample_data AS (select 'TX353 G35 1992' str from dual union all
                     select 'Ref QP141 B151 R4 1956' str from dual union all
                     select 'RM216 M285 K5 1996' str from dual union all
                     select 'T385 C22 1960' str from dual union all
                     select 'Ths LB500 M200 A5 1998' str from dual union all
                     select 'X12345' str from dual union all
                     select 'Y F123' str from dual)
SELECT str,
       regexp_substr(str, '([A-Z]{1,2})[[:digit:]]*( |$)', 1, 1, NULL, 1) sub_str
FROM   sample_data;

STR                    SUB_STR
---------------------- ----------------------
TX353 G35 1992         TX
Ref QP141 B151 R4 1956 QP
RM216 M285 K5 1996     RM
T385 C22 1960          T
Ths LB500 M200 A5 1998 LB
X12345                 X
Y F123                 Y

This looks for the pattern of one or two upper case letters followed by 0 or more digits followed by a space or the end of the line. (If you want to restrict the number of digits to 1 or more, change the * to a +.)

We put brackets around the "one or two upper case letters" to label it as a subexpression, which we can then request to be output in the regexp_substr (that's the final parameter, which in our case is 1 because our desired subexpression is the first one encountered).

If you have other special characters that could follow your desired pattern, then you can simply expand the OR section (currently ( |$)), e.g. if you wanted to include a question mark, the OR section would become ( |?|$)

how to use regexp_substr

4 Answers4