1

I have a query to treat comma as delimiter in a string. I want to substr the last occurrence.

example: string: a, b, c, d string: e, f, g, h, i, j

select regexp_substr(string, '[^,]+', -1, 1)
from dual;

should return: d and j

but the error message says that the -1 position is out of range.

Oracle Doc: https://docs.oracle.com/cd/E11882_01/olap.112/e23381/row_functions063.htm#OLAXS456

dozel
  • 127
  • 1
  • 3
  • 9
  • What's with the spaces in the first string? If the comma is a true delimiter, then the last "token" from the first sample string is `' d'` (a string of length two: a space and the letter d). –  Feb 17 '17 at 15:48
  • I can accept a string with space. – dozel Feb 17 '17 at 15:50
  • 1
    The documentation references you posted are to Oracle OLAP, a separate product, different from Oracle Database. If you have Oracle OLAP installed (whatever that means, I have no idea) you may have extended functionality to `regexp_substr` and other functions, but that is irrelevant to Oracle Database. The documentation of `regexp_substr` for standard Oracle SQL does not allow a negative position. –  Feb 17 '17 at 15:56
  • Acknowledged. Thanks – dozel Feb 17 '17 at 16:22

4 Answers4

3
SELECT regexp_substr(string, '[^,]*$') FROM t

Test

Mihai
  • 26,325
  • 7
  • 66
  • 81
  • That works, but what if I want to get the last 2nd occurrence. The problem is that the Oracle documentation specifically mentioned using negative int, the function will search backwards. I wonder why it's not working. quote from oracle: 'A nonzero integer indicating the character of source_char where the function begins the search. When position is negative, then the function counts and searches backward from the end of string. The default value of position is 1, which means that the function begins searching at the first character of source_char.' – dozel Feb 17 '17 at 15:19
  • @dozel - the Oracle docs say that it should be a positive integer. 'position is a positive integer indicating the character of source_char where Oracle should begin the search. The default is 1, meaning that Oracle begins the search at the first character of source_char.' – BriteSponge Feb 17 '17 at 15:22
  • @dozel - This is the page on the 11g documentation - https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions138.htm#SQLRF06303 – BriteSponge Feb 17 '17 at 15:23
  • @Mihai https://docs.oracle.com/cd/E18283_01/olap.112/e17122/dml_functions_2069.htm – dozel Feb 17 '17 at 15:29
  • @dozel I`ve never seen negative position,either the docs are wrong or somebody knows more than I do. – Mihai Feb 17 '17 at 15:42
  • 1
    @Mihai - see my comment under the original post. Oracle OLAP is not the same thing as Oracle Database. –  Feb 17 '17 at 15:58
  • Acknowledged. Thanks – dozel Feb 17 '17 at 16:22
2

Here are a couple of ways you could do it:

WITH sample_data AS (SELECT 'a, b, c, d' str FROM dual UNION ALL
                     SELECT 'e, f, g, h, i, j' str FROM dual UNION ALL
                     SELECT 'e, f, g, h, i, jk' str FROM dual UNION ALL
                     SELECT 'e,f,g,h,i,jk' str FROM dual UNION ALL
                     SELECT 'e,f,g,h,i,' str FROM dual UNION ALL
                     SELECT 'e, f, g, h, i,' str FROM dual)
SELECT str,
       ltrim(SUBSTR(str, INSTR(str, ',', -1, 1) + 1)) last_item1,
       regexp_substr(str, '.*, ?([^,]*$)', 1, 1, NULL, 1) last_item3
FROM   sample_data;

STR               LAST_ITEM1        LAST_ITEM3
----------------- ----------------- -----------------
a, b, c, d        d                 d
e, f, g, h, i, j  j                 j
e, f, g, h, i, jk jk                jk
e,f,g,h,i,jk      jk                jk
e,f,g,h,i,                          
e, f, g, h, i,                      

It's a toss-up between both options as to which one will be most performant and/or maintainable in your system - you'd need to test this.

The regexp_substr solution above checks for a patter of any character (except a newline), followed by a comma, followed by a space (or not) and then finally any character that's not a comma up to the end of the string. Then we output the 1st subexpression (which is defined by bracketing part of the pattern).

I included the ltrim in the substr/instr item since you said your delimiter was a comma, but it looked like maybe it was a comma+space.

Boneist
  • 22,910
  • 1
  • 25
  • 40
  • I think my question is more about why using -1 as position is not working while the oracle doc specially says that we can use negative int if we want to search backwards. I can play with substr and instr, but why should I when there's a much simpler way. – dozel Feb 17 '17 at 15:36
  • That's because the documentation you were looking at (which is in the old format, and therefore likely to be out-of-date anyway) looks to have a bug in it. You can't have negative numbers in either the position or occurrence in REGEXP_SUBSTR. – Boneist Feb 17 '17 at 15:37
  • a newer doc says the same:(https://docs.oracle.com/cd/E11882_01/olap.112/e23381/row_functions063.htm#OLAXS456). But maybe you're right, the doc has flaws. – dozel Feb 17 '17 at 15:48
  • 1
    @dozel - Don't confuse Oracle OLAP (a separate product) with standard Oracle SQL (Oracle Database)! –  Feb 17 '17 at 15:58
  • @Boneist - I asked the OP, and he/she responded that the comma is a true delimiter; the spaces, if present, must be preserved. (Or, "it is OK" if they are preserved.) If so, it is almost certain that solutions using standard string functions, not regular expressions, will be more efficient. –  Feb 17 '17 at 16:00
1

A solution without regexp could be:

select substr(string, instr( string, ',', -1)+1)
from yourTable

Here you use the instr to find the last occurrence of a ',', if any, and then a substr to only return the needed part of the input string.

Aleksej
  • 22,443
  • 5
  • 33
  • 38
1

You asked in a comment what if you wanted to get the 2nd occurrence? The 4th argument is the element you want to return. This format handles NULL elements:

with tbl(str) as (
  select 'a, b, c, d' from dual union
  select 'e,,g,h,i,j' from dual
)
select regexp_substr(str, '(.*?)(,|$)', 1, 2, NULL, 1) element
from tbl; 
Gary_W
  • 9,933
  • 1
  • 22
  • 40