I heard PHP has very poor Unicode support. So what does it take to make a PHP 5 built application Unicode supported under PHP 5.3+? Would mbstring be the only option here? How has Facebook or Yahoo gotten around this PHP limitation?
-
1Follow all instructions posted here. I suggest you download notepad++ and make sure to save the file as utf8. Many editors have a bad habit of saving as ansi which is crap – OptimusCrime Dec 20 '11 at 10:31
-
@OptimusCrime I think it's the bad habit of the programmer, not to change that in the settings, which is possible for nearly any editor (at least those with syntax-highlighting). – feeela Dec 20 '11 at 10:37
4 Answers
PHP has no low-level support for any encoding. But all that actually means is that it doesn't care on a language level. Strings in PHP are raw byte sequences, which can be in any encoding you like. When handling multi-byte strings, you need to take care to use the right string manipulation function instead of possibly screwing with the byte stream directly. So the only "non-support" of Unicode is that it doesn't include the concept of encodings into the core language itself, but you can still work with any encoding perfectly fine by manipulating strings using the appropriate string function.
Actually, if you just take a little care to keep everything in UTF-8 all the time, you will rarely have to worry about anything regarding encodings. PHP works just fine with Unicode.
For extensive coverage of this topic, please see What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

- 510,633
- 85
- 743
- 889
PHP has poor Unicode support, but it's not impossible to do it, you just have to be careful with the functions you are using and their support for unicode. This page has a good summary of unicode support for the different functions and extensions http://www.phpwact.org/php/i18n/utf-8

- 6,615
- 12
- 50
- 70
-
3The linked article is a bit hysterical. The article was written on 2009-10-21 and obviously refers to an outdated PHP version, which additionally was compiled without the mbstring-extension (most current, pre-compiled Linux-packages [e.g. those for Debian or Ubuntu] include the mbstring-extension). Just use that extension. – feeela Dec 20 '11 at 10:45
If the data comes from a tables that use UTF-8 you should just set the correct headers and meta and you should be ok (no need to encode anything):
<?php
header ('Content-type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
</body>
</html>

- 76,206
- 31
- 145
- 192
-
This wouldn't change anything on the string-handling via PHP, e.g. in methods like `substr()`. See also the link on chaft's answer… – feeela Dec 20 '11 at 10:34
The following mbstring-variables should be set via php.ini or vhost-configuration (httpd.conf; doesn't work per directory [via .htaccess]):
mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.func_overload = 7
The just leave the code as it was, make sure your Editor/IDE only saves files as UTF-8 and deliver everything as UTF-8 (via HTTP-header or META-tag).
See also: PHP Manual – Multibyte String – Function Overloading Feature

- 29,399
- 7
- 59
- 71