0

As you know Oracle POSIX implementation of regexes does not support word boundaries. One workaround is suggested here: Oracle REGEXP_LIKE and word boundaries

However it does not work if I want to, for instance select all 4 character strings. Consider this, for example:

myvar:=regexp_substr('test test','(^|\s|\W)[\S]{4}($|\s|\W)') 

This obviously selects only the first occurrence. I do not know how to do this in the Oracle world, although normally it is simply (\b)[\S]{4}(\b). The problem is that most woraround rely on some nonexistent feature, like lookaround etc.

Community
  • 1
  • 1
Dimiter
  • 37
  • 4

2 Answers2

0
select xmlcast(xmlquery('for $token in ora:tokenize(concat(" ",$in)," ")
                where string-length($token) = $size
                return $token' passing 'test test' as "in", 4 as "size" returning content) as varchar2(2000)) word from dual;

Xquery and FLWOR expresion.

concat(" ",$in) - workaround if input string is null or it has only 1 matchting word.

ora:tokenize - tokenize string by "space"

string-length($token) = $size check if token has appropriate length.

xmlcast - convert xmltype to varchar2

Easy ? Any question:)

Arkadiusz Łukasiewicz
  • 6,241
  • 1
  • 11
  • 17
  • I have seen so many wonderful solutions using Oracle XML, but I am totally ignorant about it. Can someone recommend a book or web site to get me started? – Brian Leach Dec 17 '16 at 01:13
0
DECLARE
  str     VARCHAR2(200) := 'test test';
  pattern VARCHAR2(200) := '(\w+)($|\s+|\W+)';
  match   VARCHAR2(200);
BEGIN
  FOR i IN 1 .. REGEXP_COUNT( str, pattern ) LOOP
    match := REGEXP_SUBSTR( str, pattern, 1, i, NULL, 1 );
    IF LENGTH( match ) = 4 THEN
      DBMS_OUTPUT.PUT_LINE( match );
    END IF;
  END LOOP;
END;
/

or (without using REGEXP_COUNT or the 6th parameter of REGEXP_SUBSTR that was introduced in 11G):

DECLARE
  str              VARCHAR2(200) := 'test test';
  pattern CONSTANT VARCHAR2(3)   := '\w+';
  match            VARCHAR2(200);
  i                NUMBER(4,0)   := 1;
BEGIN
  match := REGEXP_SUBSTR( str, pattern, 1, i );
  WHILE match IS NOT NULL LOOP
    IF LENGTH( match ) = 4 THEN
      DBMS_OUTPUT.PUT_LINE( match );
    END IF;
    i     := i + 1;
    match := REGEXP_SUBSTR( str, pattern, 1, i );
  END LOOP;
END;
/

Output:

test
test

If you want to use this in SQL then you can easily translate it into a pipelined function or a function that returns a collection.

MT0
  • 143,790
  • 11
  • 59
  • 117