1

I'm trying to speed up a query that uses Contains Near with one that uses regexp_like. The initial Contains Near query takes about 45 minutes to run. Clob Column holds large "documents" and is domain indexed.

Initial query:

SELECT column1
FROM TEST
WHERE CONTAINS(column1,'{NEAR(quick,fox, lazy), 3, FALSE}')>0;

Proposed query:

SELECT column1
FROM TEST
WHERE REGEXP_LIKE(column1, '(\b(quick|fox|lazy)(?:\W+\w+){1,6}?\W(quick|fox|lazy)(?:\W+\w+){1,}?\W(quick|fox|lazy)\b)','i')

I got the original regexp syntax from here: https://www.regular-expressions.info/near.html.

Problem: I get the regexp code to work in html https://www.regextester.com, but when I put it in Oracle it doesn't find anything. What is wrong with my syntax? I can't figure it out. Does Oracle handle REGEXP differently?

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
Jennifer Crosby
  • 185
  • 1
  • 1
  • 14
  • 1
    [Oracle doesn't support all regex flavours and extensions](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/Oracle-Regular-Expression-Support.html#GUID-969230D6-FC1A-4C75-BF2A-6B1BE909DED6). I think it's the `?:` that's tripping you up, but not sure. I'm a bit surprised a regular expression would be faster though... – Alex Poole Oct 23 '18 at 16:47
  • To find a single word you need to use `REGEXP_LIKE(column1, '[[:blank:]](quick|fox|lazy)[[:blank:]]','i')`. – Marmite Bomber Oct 23 '18 at 16:54
  • @AlexPoole, I am not sure if it is quicker. I am just tasked to find a quicker way and was going to check this out. They use both CONTAINS and REGEXP_LIKE a lot – Jennifer Crosby Oct 24 '18 at 10:53
  • @MarmiteBomber I want, ideally, – Jennifer Crosby Oct 25 '18 at 11:21
  • @MarmiteBomber I want, ideally: Find the document that has these 3 words and each word has maximum of 1-6 words between each of them. They can be in any order. I can find documents with one of these using regexp, but can't seem to duplicate what I can do with contains near using regexp in Oracle, only outside of Oracle. – Jennifer Crosby Oct 25 '18 at 11:49
  • Can anyone think of a better way to speed up the CONTAINS/NEAR code in ORACLE? I'm at a loss. – Jennifer Crosby Oct 25 '18 at 12:02
  • My point was to use `[[:blank:]]` and **NOT** `\b` that doesn't match. – Marmite Bomber Oct 25 '18 at 14:03
  • Can you describe your setup in a bit detail. Do you have a [library of Alexandria](https://en.wikipedia.org/wiki/Library_of_Alexandria) that it takes 45 minutes to scan? :) – Marmite Bomber Oct 25 '18 at 14:05

1 Answers1

1

Alex, you were exactly right. I don't see how to select your answer as correct though.

My problem was apparently that I was using regexp parameters that Oracle doesn't recognize. So, whereas it worked on https://www.regextester.com, it failed to work in Oracle because most of what I used isn't recognized as usable with regexp in Oracle. I really think Oracle should expand their regexp codes it recognized. This was really frustrating.

Jennifer Crosby
  • 185
  • 1
  • 1
  • 14