5

MySQL manual is not very detailed about what expressions it supports, so I am not sure if the following is possible with MySQL at all.

I am trying to create a query with RLIKE which matches the following.

The task is to get from SQL all the sentences which contains at least any two words from the given sentence.

Let's say, I have some certain words to use in regex:

hello, dog

I have following sentences in the database:

hello from dog
hello hello cat
dog says hello
dog dog goes away
big bad dog

From those all I want to match only

hello from dog
dog says hello

For now I have it like this:

SELECT *
FROM test
WHERE 
test RLIKE '(hello|dog).*(hello|dog)'

The problem is - I get also those unneeded

hello hello cat
dog dog goes away

So I guess, I need a backreference right before the second (hello|dog).

In pseudo code it would look like this:

RLIKE '(hello OR dog) anything can be here (hello OR dog, but not the word which already was in the previous group)'

so it could be like:

'(hello|dog).*(negative backreference to the 1st group goes here)(hello|dog)'

Can such negative backreference be done in MySQL regex? Or maybe there is some better way to write the regex which does the same thing, but also considering that the query will get generated by some C++ code, so it shouldn't be too complex to generate?

JustAMartin
  • 13,165
  • 18
  • 99
  • 183
  • 1
    This sounds like something that it would make much more sense to build an index of word -> sentence mappings for (and then simply query against the index). – Amber Nov 08 '10 at 09:04
  • 1
    Or `(hello.*dog|dog.*hello)` ? – Konerak Nov 08 '10 at 09:05
  • (hello.*dog|dog.*hello) would be great if there is always only two words, but there may also be more, and then I'll have to create all the possible word combinations myself. – JustAMartin Nov 08 '10 at 09:50

1 Answers1

7

MySQL uses a Posix Extended Regular Expression engine (POSIX ERE) and therefore doesn't support backreferences at all. It also doesn't support lookaround which you would need in order to construct a single regex that could handle this.

Therefore you have to spell out all possible combinations:

hello.*dog|dog.*hello

Of course, this will get unwieldy if the number of match candidates increases, so regular expressions are not the right tool for this in MySQL, unless you can install/use LIB_MYSQLUDF_PREG.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561