How to replace all non-alphabetic characters with UTF-8 support in PHP

Question

I want to remove all non-alphabetic character from a string. The problem is that I don't know the letter range because it is UTF8 string.

It can be ENGLISH, ՀԱՅԵՐԵՆ, ქართული, УКРАЇНСЬКИЙ, РУССКИЙ

I usually do something like this:

$str = preg_replace('/[^a-zA-Z]/', '', $str);

or

$str = preg_replace('/[^\w]/u', '', $str);

but they both clear foreign characters.

Any ideas?

score 10 · Answer 1 · edited Aug 16 '12 at 16:25

10

$str = preg_replace('/\P{L}+/u', '', $str);

edited Aug 16 '12 at 16:25

Paul T. Rawkeen

answered Aug 16 '12 at 14:42

Jocelyn

1

As a side note, it's worth mentioning the syntax for specifying a Unicode character class when the u flag is used. Curly brackets are needed around the code points. For example, `[\x{0400}-\x{04FF}]` matches any characters in the regular Cyrillic range. – cleong Aug 16 '12 at 15:06
How do you have to change the Regex to also keep numbers (next to the alphabetic ones) and not remove them? – Avatar Mar 03 '22 at 12:13

Paul T. Rawkeen · Accepted Answer · 2012-09-05T14:53:44.923

8

UPDATE: As for Unicode, RegExp will look like this [^\p{L}\s]+ (without replacing spaces)

It will replace all non-alpha characters with UTF8 support.

Here are some reference docs that can be helpful:

edited Sep 05 '12 at 14:53

answered Aug 16 '12 at 14:37

Paul T. Rawkeen

"Alphabetic" doesn't mean only characters used in English. – cleong Aug 16 '12 at 14:46
@cleong, Sorry, my fault, missed that point. I've corrected my answer. – Paul T. Rawkeen Aug 16 '12 at 14:53
I think both answers are super, but I think this got more info – Mirko Akov Aug 20 '12 at 14:21

score 1 · Answer 3 · answered Aug 16 '12 at 14:43

1

Unicode property for letter is \pL, for non letter is \PL

$str = preg_replace('/\PL+/u', '', $str);

answered Aug 16 '12 at 14:43

Toto

3 Answers3