1

I am using REGEXP_SUBSTR in Oracle 11g and I am having difficulty trying to extract the following strings.

My query is:

SELECT regexp_substr('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', '[^CN=]*\,', 1, rownum) line
 FROM dual
 CONNECT BY LEVEL <= length('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,') - 
                     length(REPLACE('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', ',', ''))

From this query, I am having issues trying to match on exact string 'CN=' as from this query, I need the output to appear as follows:

CN=aTIGERAdmin-Admin,
CN=D0902498,
CN=ea90045052,
CN=aTIGERCall-Admin,

And in this format, with the comma at the end.

The way I am doing it at the moment is chopping off the "CN=" but I actually require this part.

halfer
  • 19,824
  • 17
  • 99
  • 186
tonyf
  • 34,479
  • 49
  • 157
  • 246

2 Answers2

5

I think this will return the resultset you are looking for:

SELECT REGEXP_SUBSTR(d.s,'CN=.*?,', 1, ROWNUM) line
  FROM (SELECT 'CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,'
        AS s FROM dual) d
CONNECT BY LEVEL <= LENGTH(d.s) - LENGTH(REPLACE(d.s,',',''))

The regular expression trick used here is to specify the ? modifier (following the .*) to make the match "non-greedy". The default match (without the ? modifier) is "greedy" in that it will match as much of the string as possible. In your case, you want the match to end at the first comma found. The intent here is to match literal string 'CN=' followed by any number of characters (zero, one or more) up to the first comma encountered.

This will work in Oracle 10g as well as 11g.

In 11g, the REGEXP_COUNT function can replace your "count of comman" calculation of occurrences.

CONNECT BY LEVEL <= REGEXP_COUNT(d.s,'CN=.*?,')

(BTW... by using a subquery to return the literal string, the literal string only has to be specified once. That makes it much easier to change the string for testing, rather than having to change it in multiple places.)


Addendum:

I can confirm that the comma is included in the returned value. Sample output:

LINE
-----------------------
CN=aTIGERAdmin-Admin,                                                 
CN=D0902498,                                                          
CN=ea90045052,                                                        
CN=aTIGERCall-Admin, 
spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • thanks @spencer7593 - just want to confirm that your solution also has the comma at the end of each "CN=" string as in my required output above? – tonyf Jun 25 '12 at 21:43
  • @tonsils: yes, the regular expression does include the trailing comma in the returned values. On Oracle 11g, I would use the REGEXP_COUNT function in place of the "count commas" expression. – spencer7593 Jun 25 '12 at 21:50
0

I'm not an LDAP master, but will the regular expression CN=[^,]+ (C, then N, then equals sign, greedily followed by more than one non-comma) work for you?

Also, do you know about REGEXP_COUNT, new in 11g?

SQL> SELECT REGEXP_SUBSTR('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', 'CN=[^,]+', 1, ROWNUM) line
  2   FROM dual
  3   CONNECT BY LEVEL <= REGEXP_COUNT('CN=aTIGERAdmin-Admin, CN=D0902498, CN=ea90045052, CN=aTIGERCall-Admin,', 'CN=[^,]+')
  4  /

LINE
----------------------------------------------------------------------------------------------------
CN=aTIGERAdmin-Admin
CN=D0902498
CN=ea90045052
CN=aTIGERCall-Admin

SQL>
Tebbe
  • 1,372
  • 9
  • 12
  • This doesn't meet the specified requirements, it does not include the trailing comma character, which was requested. This will also return a CN= which is not followed by a comma, which is probably desirable. – spencer7593 Jun 25 '12 at 19:35
  • OK. Does `'CN=[^,]+,?'` as the second argument to `REGEXP_SUBSTR`/`REGEXP_COUNT` meet your requirements? – Tebbe Jun 25 '12 at 20:49
  • I think `'CN=[^,]*,'` as a regular expression will work just fine to return the specified resultset. As an oddball corner case, consider the output when the string being searched contains e.g. `'CN=, CN=foo,'`. Do we want that `'CN=,'` to be returned? This is an example of where the use of the REGEXP_COUNT function (as you recommend) is preferred over getting a count of commas. – spencer7593 Jun 25 '12 at 21:12