1
^(?![_\.\'\-])(?:[\p{L} ]+)$

If I understand correctly, there is:

  • (?![_\.\'\-]) a negative lookahead, that is the string cannot start with underscore, point, apostrophe or minus sign (any number of).
  • (?:[\p{L} ]+) allowing at least one character in Ll, Lm, Lo, Lt and Lu and spaces.

First question is: something like "1Bob" should not fail (because of the lookahead). So why it fails?

Second question is where I can find a list or explanation of characters in Ll, Lm, Lo, Lt and Lu?

gremo
  • 47,186
  • 75
  • 257
  • 421

2 Answers2

3

The digit "1" is not matched by \p{L} (this matches only letters!). If you want to match any (numeric) digit, use the class \p{N} as well:

$text = "1Bob";

if (preg_match("/^(?![_\.\'\-])(?:[\p{N}\p{L} ]+)$/u", $text)) {
  echo "Matched!\n";
} else {
  echo "No match...\n";
}

which will print:

Matched!

Also, there are small differences between Ruby's regex engine and that of PHP. Since your target language seems to be PHP, I recommend testing it with PHP, not with Rubular (Ruby).

Note that inside character classes, the "normal" regex meta chars don't have any special powers and need not be escaped: preg_match("/^(?![_.'-])(?:[\p{N}\p{L} ]+)$/u", $text)

An overview of many Unicode Character Properties/Classes can be found here: http://www.regular-expressions.info/unicode.html

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • So, "1Bob" fails because the second group, not because the lookahead? – gremo Sep 22 '12 at 09:42
  • @Gremo, correct, verify it by simply doing: `if (preg_match("/^(?![_\.\'\-])/", $text)) { ... }` – Bart Kiers Sep 22 '12 at 09:46
  • 1
    Also, you can drop all those backslashes in the lookahead character class. You probably also need to use the `/u` modifier so the Unicode prüperties work properly, I think. – Tim Pietzcker Sep 23 '12 at 06:40
  • 1
    @BartKiers: Does it also work with non-ASCII letters? Sorry for the typo...`s/prüperties/properties` :) – Tim Pietzcker Sep 23 '12 at 07:20
  • @TimPietzcker, he he, I was messing up some regex Q&A's: I thought this one had a non-ascii in it... :) – Bart Kiers Sep 23 '12 at 07:30
1
(?![_\.\'\-])

is the same as

(?![_.'-])

Most metacharacters within bracketed character classes do not require escaping. The dash would require escaping if it were part of an intelligible range. Being as the dash is at the end of the bracketed character class, it does not require escaping either.

protist
  • 1,172
  • 7
  • 9
  • while this is true, it's not an answer to the OP's question: such remarks are better suited as a comment below the question instead of being posted as an answer. – Bart Kiers Sep 23 '12 at 06:04
  • Bart Kiers - Thank you, I am new here. Isn't the formatting bad for comments (does not allow newlines)? – protist Sep 23 '12 at 07:23
  • @protist good to know. Is there a quick reference of character to be escaped? – gremo Sep 23 '12 at 07:25
  • Gremo - perldoc perlrecharclass has a section "Bracketed Character Classes". You can also see the same information here http://perldoc.perl.org/perlrecharclass.html. – protist Sep 23 '12 at 07:29
  • @protist, true, comments are not well suited for long chunks of text. But on SO, an "answer" must (or should) be a true answer to the question. It's not like a classic forum where comments are added to a post, here it must (or should) be questions and answers. You don't have to remove this comment/answer of yours, but just so you know how SO is supposed to work. And welcome! :) – Bart Kiers Sep 23 '12 at 07:37