1

I want to search a string using a delimiter which is a combination of 2 or more characters.

Please find the query i tried below:

select REGEXP_SUBSTR('123$@45$6$@789','[^$@]+',1,2) from dual

Required Output:

45$6

Output:

45

I understand it is easily possible using user defined functions [with INSTR+SUBSTR] however I am looking for an answer & explanation with REGEXP_SUBSTR.

Thanks in advance.

pOrinG
  • 896
  • 3
  • 13
  • 27

1 Answers1

4

Not like that... Rather:

select REGEXP_SUBSTR('123$@45$6$@789','(.*?)(\$@|$)', 1, 2, null, 1) from dual;

Notice \$ in the code; $ means end of string, so if you mean a literal dollar sign symbol, you must escape it. This solution uses the "capturing group" concept - see the Oracle documentation for REGEXP_SUBSTR() if you are not familiar. The first capturing group is (.*?) - whatever comes before the delimiter; and it is referenced in the sixth (last) argument to REGEXP_SUBSTR.

Notice also that after the first capturing group I check for either the two-character delimiter or the end of the string... which is marked by $. Two options in parentheses and separated by | (another "special character") is the regular expression syntax for "either... or...".

  • Thank you so much. Advise me if my understanding is correct. `(.*?)`: dot is for any character, * is for zero or more occurrence of any character which sounds sufficient however if we leave it to there then it becomes greedy search and fetches all till the end of string. So the `?` is to make the search non-greedy and to stop at the first point where it finds the recurrence of delimiter. Thanks for the tip on escaping special characters & the `or` end of string to get the last delimited value. – pOrinG Oct 03 '17 at 14:32
  • Your understanding of "greedy" vs. "non-greedy" (or "lazy") matching is exactly right. Since we can't use the trick you wanted to use (something like `[^;]+`), searching only for uninterrupted substrings of NON-semicolon (or whatever single-character delimiter), and instead we really must use `.*` or `.+` until we hit a delimiter, we MUST use non-greedy search. Note also a weakness of `+` rather than `*` - if you have two delimiters immediately next to each other, that should represent a NULL; but with the + operator you will miss those completely. The * operator catches them too. –  Oct 03 '17 at 14:38
  • That weakness of `+` vs `*` really surprised me. Wouldn't have thought about that unless experienced an issue because of it. Appreciate it. – pOrinG Oct 03 '17 at 14:44