1

I am trying to extract the following from the text field using Regrex in Oracle.

  • For example

"This is example,

and this really a example :h,j,j,j,j,

l //Updated question , as this letter is on the next line

now this is a disease:yes"

I am expecting a result as h,j,j,j,j,l, but if I use

REGEXP_SUBSTR(text_field,'example :[^:]+,') AS Result

I am getting example:h,j,j,j,j But I am not getting the last letter 'l' like above and I am guessing that's because it's on the next line.Also, if I want the string "disease:yes" only, that will be so helpful as well. Thank you much!

Rahul Rao
  • 13
  • 3

1 Answers1

1

The result you are getting is because your pattern includes the word 'example' and ends with a comma, leaving out the ending 'l'. Try this form instead. Note the example is shown using a Common table Expression (CTE). The WITH statement creates the table called tbl which just sets up test data, kind of like a temp table. This is also a great way to set up data when asking a question. This form of the REGEXP_SUBSTR() function uses a captured group, which is the set of characters after the string 'example:' until the end of that line in the multi-line field. From this you should be able to get the other string you are after. Give it a go.

WITH tbl(text_field) AS (
  SELECT 'This is example,

and this really a example :h,j,j,j,j,l

now this is a disease:yes' FROM dual
)
SELECT REGEXP_SUBSTR(text_field,'example :(.*)', 1, 1, NULL, 1) AS Result
FROM tbl;

RESULT     
-----------
h,j,j,j,j,l

1 row selected.

Edit based on new info. Since that last letter could be on it's own line, you'll need to allow for the newline. Use the 'n' flag to REGEXP_REPLACE() which allows the newline to match in the usage of the dot (match any character) symbol in regex. We switch to REGEXP_REPLACE as we'll need to return multiple capture groups. Here the WITH sets up 2 rows, one with an embedded newline in the data and one without. The capture groups are (going left to right) 1-the data after "example :" and ending in a comma, 2-the optional newline and 3-the next single character. Then replace the entire data with captured groups 1 and 3 (leaving out the newline). NOTE this is very specific to the case of only 1 character on the following line.

WITH tbl(ID, text_field) AS ( 
  SELECT 1, 'This is example,
and this really a example :h,j,j,j,j,
l
now this is a disease:yes'  FROM dual UNION ALL
  SELECT 2, 'This is example,
and this really a example :h,j,j,j,j,l
now this is a disease:yes'  FROM dual
) 
SELECT ID,
       REGEXP_REPLACE(text_field, '.*example :(.*,)('||CHR(10)||')?(.).*', '\1\3', 1, 1, 'n') AS Result 
FROM tbl; 


        ID RESULT      
---------- ------------
         1 h,j,j,j,j,l 
         2 h,j,j,j,j,l 

2 rows selected.
Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • Thank you so much Gary, the reason I beleive is due to the last character being on the next line , For example : WITH tbl(text_field) AS ( SELECT 'This is example, and this really a example :h,j,j,j,j, l //if the character is in next line, I missing this letter now this is a disease:yes' FROM dual ) SELECT REGEXP_SUBSTR(text_field,'example :(.*)', 1, 1, NULL, 1) AS Result FROM tbl; can we use '||CHR(10)||' ? – Rahul Rao Aug 23 '22 at 16:39
  • @RahulRao That criteria was not mentioned in your original post. Please edit your original post to add that. – Gary_W Aug 23 '22 at 18:48
  • Is that last letter ALWAYS on the next line? Could there be a variable number of letters on the next line? Could you edit the OP to show some real data? – Gary_W Aug 23 '22 at 21:01
  • Yes, the last letter is always on the next line , followed by some text on next line. For example : WITH tbl(ID, text_field) AS ( SELECT 1, 'This is example, and this really a example :h,j,j,j,j, l now this is a disease:yes' this is some random string upto 300 characters more which might have text not in the same format.' FROM dual UNION ALL SELECT 2, 'This is example, and this really a example :h,j,j,j,j,l now this is a disease:yes' FROM dual ) SELECT ID, REGEXP_REPLACE(text_field, '.*example :(.*,)('||CHR(10)||')?(.).*', '\1\3', 1, 1, 'n') AS Result FROM tbl; – Rahul Rao Aug 24 '22 at 13:49
  • @RahulRao Glad to help but the comment section has no formatting so your examples are better off in the original post. – Gary_W Aug 24 '22 at 14:05