3

With this regular expression can not validate the text in the following languages​​:

/^[\p{L}\p{Nd}-_.]{1,20}$/u

Languages ​​that do not work:

Bengali, Gujarati, Hindi, Marathi, Thai, Tamil, Telugu, Vietnamese

when used with PHP's preg_match.

What am I missing?

Shervin
  • 1,936
  • 17
  • 27
user2068995
  • 39
  • 1
  • 3

2 Answers2

4

You're using the dash incorrectly. If you want it to match a literal dash character, you need to either escape it (\-) or put it at the end of the character class.

Also, I'm not familiar with those languages, but I guess you might need to account for marks as well:

/^[\p{L}\p{Nd}\p{M}_.-]{1,20}$/u
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • this is solution and you can find more details on http://php.net/manual/en/regexp.reference.unicode.php – Huseyin Aug 03 '13 at 21:33
0

The problem doesn't come from your regex (except the fact that the character - must be always at the begining or at the end of a character class) . Note that your pattern can be shorten as:

/^[\w.-]{1,20}$/u

or

/^[\p{Xan}.-]{1,20}$/u

if you want to remove the underscore

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • 1
    In PHP, even if you use the `/u` option, `\w` only matches `[A-Za-z0-9_]`. If you want it to match the same as `[\p{L}\p{Nd}_]`, you have to start the regex with `(*UCP)`, forcing it to match by Unicode properties instead of lookup tables: `/(*UCP)^[\w.-]{1,20}$/u`. The same applies to `\d`, `\s`, `\b`, and the opposites: `\W`, `\D`, `\S` and `\B`. – Alan Moore Aug 04 '13 at 03:16