I'm not familiar with the how regular expressions treat hexadecimal, anyone knows?
1 Answers
The following does the trick:
$str = "some മനുഷ്യന്റെ";
echo preg_replace('/[\x{00ff}-\x{ffff}]/u', '*', $str);
// some **********
echo preg_replace('/[^\x{00ff}-\x{ffff}]/u', '*', $str);
// *****മനുഷ്യന്റെ
The important thing is the u
-modifier (see here):
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
And here a short description why \uFFFF
is not working in PHP:
Perl and PCRE do not support the \uFFFF syntax. They use \x{FFFF} instead. You can omit leading zeros in the hexadecimal number between the curly braces. Since \x by itself is not a valid regex token, \x{1234} can never be confused to match \x 1234 times. It always matches the Unicode code point U+1234. \x{1234}{5678} will try to match code point U+1234 exactly 5678 times.

- 1,150
- 8
- 21

- 82,642
- 24
- 155
- 189
-
Does the `\uXXXX` syntax work with `u` modifier? Can you check if this works? `/[\u00FF-\uFFFF]/u` – Amarghosh Apr 28 '10 at 09:53
-
@openid: You have one `[` to much. – Felix Kling Apr 28 '10 at 09:56