0

I have some URLs without http:// in a database table:

        url
row #1: 10.1.127.4/
row #2: 10.1.127.4/something

Now, the following filter gives me row #2 - fine:

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4/something[[:>:]]'

But the following filter does not give me row #1, but shouldn't it?

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4/[[:>:]]'

I should note that also escaping the forward slash via backslash doesn't return the wanted row #1:

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4\/[[:>:]]'

Anse
  • 1,573
  • 12
  • 27

3 Answers3

1

As per the docs: http://dev.mysql.com/doc/refman/5.7/en/regexp.html

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

/ is not an alnum member, therefore it's not a word boundary.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • *`/` is not an alnum member, therefore it's not a word boundary.* - no, there is no word boundary **between** `/` and the end of string. A word boundary is a zero-width assertion. – Wiktor Stribiżew May 31 '16 at 16:02
0
SELECT * FROM mytable WHERE mycolumn REGEXP "[[:<:]][0-9]{1,3}\\.([0-9]{1,3}.?){3}((\\/)?[^ ]*)?[[:>:]]";

[[:<:]][0-9]{1,3}\.([0-9]{1,3}.?){3}((\/)?[^ ]*)?[[:>:]]

Assert position at the beginning of a word (position followed by but not preceded by an ASCII letter, digit, or underscore) «[[:<:]]»
Match a single character in the range between “0” and “9” «[0-9]{1,3}»
   Between one and 3 times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «{1,3}»
Match the character “.” literally «\.»
Match the regex below and capture its match into backreference number 1 «([0-9]{1,3}.?){3}»
   Exactly 3 times «{3}»
      You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{3}»
   Match a single character in the range between “0” and “9” «[0-9]{1,3}»
      Between one and 3 times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «{1,3}»
   Match any single character that is NOT a line break character (line feed) «.?»
      Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
Match the regex below and capture its match into backreference number 2 «((\/)?[^ ]*)?»
   Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
   Match the regex below and capture its match into backreference number 3 «(\/)?»
      Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
      Match the character “/” literally «\/»
   Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^ ]*»
      Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
      The literal character “ ” « »
Assert position at the end of a word (position preceded by but not followed by an ASCII letter, digit, or underscore) «[[:>:]]»
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
0

Found out that [[:>:]] expects a word character to the left, and vice versa for [[:<:]]

Simple tests verify that:

SELECT 'bla,,123' REGEXP '[[:<:]]bla,[[:>:]]' -- no match
SELECT 'bla,,123' REGEXP '[[:<:]]bla[[:>:]]' -- match
SELECT 'bla,,123' REGEXP '[[:<:]]bla,,123[[:>:]]' -- match

I think this way the documentation makes sense, and I misunderstood that for quite a few years:

[...] word boundaries. They match the beginning and end of words, [...]

So, a word boundary expects

  1. a non-word character on the one side
  2. and a word character on the other side
Anse
  • 1,573
  • 12
  • 27