MySQL REGEXP with word boundary after forward slash

Question

I have some URLs without http:// in a database table:

        url
row #1: 10.1.127.4/
row #2: 10.1.127.4/something

Now, the following filter gives me row #2 - fine:

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4/something[[:>:]]'

But the following filter does not give me row #1, but shouldn't it?

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4/[[:>:]]'

I should note that also escaping the forward slash via backslash doesn't return the wanted row #1:

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4\/[[:>:]]'

It makes no sense, there is no word boundary between `/` and the end of string. — Wiktor Stribiżew, May 31 '16 at 15:54
Please specify the exact requirements. If you want to match at the end of string or before a non-alnum char, use `'[[:<:]]10[.]1[.]127[.]4/([[:>:]]|$)'`. The dot should be either escaped or put into a bracket expression. — Wiktor Stribiżew, May 31 '16 at 15:59
If you need to match any numbers (1 or more) use `[0-9]+`, and actually `/([[:>:]]` will match a `/` before an alnum char. If you need the opposite, use `'[[:<:]]10[.]1[.]127[.]4/([^[:alnum:]]|$)'` — Wiktor Stribiżew, May 31 '16 at 16:07

score 1 · Answer 1 · answered May 31 '16 at 15:55

1

As per the docs: http://dev.mysql.com/doc/refman/5.7/en/regexp.html

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

/ is not an alnum member, therefore it's not a word boundary.

answered May 31 '16 at 15:55

Marc B

356,200
43
426
500

*`/` is not an alnum member, therefore it's not a word boundary.* - no, there is no word boundary **between** `/` and the end of string. A word boundary is a zero-width assertion. – Wiktor Stribiżew May 31 '16 at 16:02

score 0 · Answer 2 · answered May 31 '16 at 16:02

SELECT * FROM mytable WHERE mycolumn REGEXP "[[:<:]][0-9]{1,3}\\.([0-9]{1,3}.?){3}((\\/)?[^ ]*)?[[:>:]]";

[[:<:]][0-9]{1,3}\.([0-9]{1,3}.?){3}((\/)?[^ ]*)?[[:>:]]

Assert position at the beginning of a word (position followed by but not preceded by an ASCII letter, digit, or underscore) «[[:<:]]»
Match a single character in the range between “0” and “9” «[0-9]{1,3}»
   Between one and 3 times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «{1,3}»
Match the character “.” literally «\.»
Match the regex below and capture its match into backreference number 1 «([0-9]{1,3}.?){3}»
   Exactly 3 times «{3}»
      You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{3}»
   Match a single character in the range between “0” and “9” «[0-9]{1,3}»
      Between one and 3 times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «{1,3}»
   Match any single character that is NOT a line break character (line feed) «.?»
      Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
Match the regex below and capture its match into backreference number 2 «((\/)?[^ ]*)?»
   Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
   Match the regex below and capture its match into backreference number 3 «(\/)?»
      Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
      Match the character “/” literally «\/»
   Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^ ]*»
      Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
      The literal character “ ” « »
Assert position at the end of a word (position preceded by but not followed by an ASCII letter, digit, or underscore) «[[:>:]]»

score 0 · Accepted Answer · answered Jun 01 '16 at 06:59

Found out that [[:>:]] expects a word character to the left, and vice versa for [[:<:]]

Simple tests verify that:

SELECT 'bla,,123' REGEXP '[[:<:]]bla,[[:>:]]' -- no match
SELECT 'bla,,123' REGEXP '[[:<:]]bla[[:>:]]' -- match
SELECT 'bla,,123' REGEXP '[[:<:]]bla,,123[[:>:]]' -- match

I think this way the documentation makes sense, and I misunderstood that for quite a few years:

[...] word boundaries. They match the beginning and end of words, [...]

So, a word boundary expects

a non-word character on the one side
and a word character on the other side

MySQL REGEXP with word boundary after forward slash

3 Answers3