0

I need to find a pattern which will give me a substring which follows logic to substring from the right side, after the third _.

What I want to achieve is to retrieve everything up until 3rd _ .

I tried this SQL, but it did not work

select 
    SUBSTRING('22D_XYZ_xy_100_xyz_123', 1, position('_' in REVERSE('22D_XYZ_xy_100_xyz_123')) - 1) as result;

It returns the 22D whereas I expect 22D_XYZ_xy

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Przemek
  • 41
  • 8

2 Answers2

1

The phrase, "substring which follows logic to substring from the right side, after the third _," is difficult to understand. I infer that the intended behavior is to extract the subtring that includes all text before the third _ from the right and not all text before the third _ from the left. When searching from the left, there is also ambiguity as to expected behavior if there are less than three _ characters: should the entire string be returned or should the result be NULL? Unfortunately, the example input, 22D_XYZ_xy_100_xyz_123, and expected output, 22D_XYZ_xy, are consistent with all of these cases. The following query demonstrates all three interpretations:

SELECT
  string,
  regexp_substr(string, '^.*(?=(_[^_]*){3}$)') as from_right,
  regexp_substr(string, '^([^_]*_){0,2}[^_]*') as from_left,
  regexp_substr(string, '^([^_]*_){2}[^_]*(?=_)') as from_left_requires_3
FROM (
  VALUES ('22D_XYZ_xy_100_xyz_123'),
         ('1_2_3_4_5_6_7'),
         ('1_2_3_4_5_6'),
         ('1_2_3_4_5_'),
         ('_1_2_3_4_5'),
         ('1_2_3_4_5'),
         ('1_2_3_4'),
         ('1_2_3'),
         ('1_2_'),
         ('1_2'),
         ('1'),
         ('____'),
         ('___'),
         ('__'),
         ('_'),
         (''),
         (NULL)) t(string);

Executing the query gives the following:

string from_right from_left from_left_requires_3
'22D_XYZ_xy_100_xyz_123' '22D_XYZ_xy' '22D_XYZ_xy' '22D_XYZ_xy'
'1_2_3_4_5_6_7' '1_2_3_4' '1_2_3' '1_2_3'
'1_2_3_4_5_6' '1_2_3' '1_2_3' '1_2_3'
'1_2_3_4_5_' '1_2_3' '1_2_3' '1_2_3'
'_1_2_3_4_5' '_1_2' '_1_2' '_1_2'
'1_2_3_4_5' '1_2' '1_2_3' '1_2_3'
'1_2_3_4' '1' '1_2_3' '1_2_3'
'1_2_3' NULL '1_2_3' NULL
'1_2_' NULL '1_2_' NULL
'1_2' NULL '1_2' NULL
'1' NULL '1' NULL
'____' '_' '__' '__'
'___' '' '__' '__'
'__' NULL '__' NULL
'_' NULL '_' NULL
'' NULL '' NULL
NULL NULL NULL NULL

To search from the right, a positive lookahead zero-length assertion, (?=(_[^_]*){3}$), is used to check that the unmatched portion of the string begins with _ and includes exactly three _ characters. If the assertion is false, then NULL is returned.

To find the left substring with no more than two _ characters, the regular expression matches up to two substings of zero or more non-_ characters followed by a single _ along with any non-_ characters prior to the next _ or the end of the string.

Finding the left substring preceding a required third _ is similar to finding the left substring with no more than two _ characters, except that the occurrence specifier for the group is {2} instead of {0,2} and a positive lookahead zero-lengh assertion, (?=_), ensures that _ is the first unmatched character. As with the search from the right, if the assertion is false, then NULL is returned.

JohnH
  • 2,001
  • 1
  • 2
  • 13
0

You could try this query

SELECT 
    REGEXP_SUBSTR('22D_XYZ_xy_100_xyz_123', '^[^_]+_[^_]+_[^_]+') as result;

See demo here

Trung Duong
  • 3,475
  • 2
  • 8
  • 9