3

OK So i am confused (obviously)

I'm trying to return rows (from Oracle) where a text field contains a complete word, not just the substring.

a simple example is the word 'I'.

Show me all rows where the string contains the word 'I', but not simply where 'I' is a substring somewhere as in '%I%'

so I wrote what i thought would be a simple regex:

select REGEXP_INSTR(upper(description), '\bI\b') from mytab;

expecting that I should be detected with word boundaries. I get no results (or rather the result 0 for each row.

what i expect:

  • 'I am the Administrator' -> 1
  • 'I'm the administrator' -> 0
  • 'Am I the administrator' -> 1
  • 'It is the infamous administrator' -> 0
  • 'The adminisrtrator, tis I' -> 1

isn't the /b supposed to find the contained string by word boundary?

tia

Ollie
  • 17,058
  • 7
  • 48
  • 59
Randy
  • 16,480
  • 1
  • 37
  • 55
  • possible duplicate of [Oracle REGEXP\_LIKE and word boundaries](http://stackoverflow.com/questions/7567700/oracle-regexp-like-and-word-boundaries) – Dzyann Mar 02 '15 at 13:35

3 Answers3

8

I believe that \b is not supported by your flavor of regex :

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#i1007670

Therefore you could do something like :

(^|\s)word(\s|$)

To at least ensure that your "word" is separated by some whitespace or it's the whole string.

FailedDev
  • 26,680
  • 9
  • 53
  • 73
4

Oracle doesn't support word boundary anchors, but even if it did, you wouldn't get the desired result: \b matches between an alphanumeric character and a non-alphanumeric character. The exact definition of what an alnum is differs between implementations, but in most flavors, it's [A-Za-z0-9_] (.NET also considers Unicode letters/digits).

So there are two boundaries around the I in %I%.

If you define your word boundary as "whitespace before/after the word", then you could use

(^|\s)I(\s|$)

which would also work at the start/end of the string.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
1

Oracle native regex support is limited. \b or < cannot be used as word delimiters. You may want Oracle Text for word search.

Benoit
  • 76,634
  • 23
  • 210
  • 236