0

I need a solution that remove all special characters except alphanumeric and accents. I tryed this solution without success.

preg_replace('/[^a-zA-ZáéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ0-9_ \.&-]/s', '', $string);

Furthermore, its need that regex (or other specific solution) allow chinese and arabian charset.

any help its really apprecieted!

Vincenzo Lo Palo
  • 1,341
  • 5
  • 19
  • 32
  • http://www.php.net/manual/en/regexp.reference.unicode.php – CBroe Nov 28 '13 at 14:08
  • 2
    So what exactly *are* "special characters"? *"Remove everything except a few selected ones and this whole giant block of other characters which make up the majority of Unicode"* is a bit vague. *Why* do you need to remove those characters? – deceze Nov 28 '13 at 14:11
  • Im building a search text field than I need clean keywords from "/%$@* and more... – Vincenzo Lo Palo Nov 28 '13 at 14:13
  • What's so special about "/%$@*, why can't people search for those characters? Why only "alphanumeric, accents, Chinese and Arabian", what about, say, Korean, Japanese and Sanskrit? – deceze Nov 28 '13 at 14:14
  • because there are not results from those characters – Vincenzo Lo Palo Nov 28 '13 at 14:16
  • yes I need allow all languages – Vincenzo Lo Palo Nov 28 '13 at 14:16
  • So what exactly don't you allow then? Look at the link above by CBroe. Sounds like `\pL` may be what you want, but that's honestly hard to tell. – deceze Nov 28 '13 at 14:18
  • 2
    Will there be any results for "aslkgjalkgjaljgaslkjg"? No? Then display 0 results. Will there be any results for Chinese characters? No? Then display 0 results. – maček Nov 28 '13 at 15:26

2 Answers2

8
$string = preg_replace('/\PL/u', '', $string);
  • L is a character attribute meaning letter
  • \P means does not match attribute
  • /u is the Unicode modifier, you need this if you want to handle Unicode characters
  • make sure $string is encoded in UTF-8

So this matches all non-letters and removes them. I can only guess that this matches what you want. See http://www.php.net/manual/en/regexp.reference.unicode.php for more attributes you could match by, e.g. /[^\pL\pS]/u would match everything except letters and "symbols".

deceze
  • 510,633
  • 85
  • 743
  • 889
0
echo preg_replace('/[^أ-يA-Za-z0-9 ]/ui', '', $string);
Moeed Farooqui
  • 3,604
  • 1
  • 18
  • 23