4

mbstring extension provides enhanced support for Simplified Chinese, Traditional Chinese, Korean, and Russian in addition to Japanese.

I tried displaying a Japanese character (which I copied from www.google.co.jp) on my PHP page and it displayed fine. Do I need to use mbstring when I'm displaying UTF-32 characters?

EDIT:

<?php
    echo "भ";                  
    $s = strlen("भ");
    echo $s;
?>

How do I make the second line of code to work?

PS: I have changed PHP default charset to UTF-8.

dda
  • 6,030
  • 2
  • 25
  • 34
John Eipe
  • 10,922
  • 24
  • 72
  • 114
  • 2
    All your questions would have a "It depends" answer. Stackoverflow also works best if you ask one question at a time. Start with [searching for your first one: "What is default encoding for PHP?"](http://stackoverflow.com/search?q=What+is+default+encoding+for+PHP%3F) - Some Q&A about that already exists. – hakre Nov 09 '12 at 09:04
  • 4
    You need to use mbstring if you intend to *process* any Unicode string. You don't need it if you just fetch and print blobs of bytes that just so happen to represent Unicode strings. – Jon Nov 09 '12 at 09:05
  • @Jon could you elaborate. I understand that unicode is not supported by PHP. But how does it render the japanese when it(php) is under the understanding that 1 byte = 1 character ? – John Eipe Nov 09 '12 at 09:44
  • 3
    @John: It does not. PHP passes on the bytes to the browser without realizing what they really mean, and it's the browser that knows how to render the text. – Jon Nov 09 '12 at 09:59
  • @Jon thax, one more doubt. See EDIT. Since PHP doesnt support unicode how do I do this? – John Eipe Nov 09 '12 at 10:15
  • 2
    `How do I make the second line of code to work?` Using `mb_strlen`, that's exactly mbstring's purpose. – Pekka Nov 09 '12 at 10:16
  • great. But `strlen("भ");` prints 3 and `mb_strlen("भ");` also prints 3? It should have recognized it as 1 character now, right? – John Eipe Nov 09 '12 at 10:37
  • FWIW, भ is not Japanese. It's Devanagari 'bha' (used in various languages in India). – dda Nov 10 '12 at 04:43

1 Answers1

2
  • You need the mb_ string functions in place of the regular string functions, e.g. mb_substr instead of substr. If you don't use the regular string functions, there's no use for the mb_ functions either.
  • If you're just passing text through and PHP isn't doing anything with that text, there's no need for the mb_ functions.
  • To make the mb_ functions work correctly, you'll have to tell them what encoding your text is in. They support many different encodings, without telling them which you're using their results will be incorrect. You can pass that encoding to each mb_ function call, e.g. mb_strlen($str, 'UTF-8'), or you can set it once for all mb_ functions using mb_internal_encoding('UTF-8').

See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for a comprehensive introduction.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • I thought since I have set default charset to UTF-8 in php.ini i needn't provide 'UTF-8' in mb_ functions. – John Eipe Nov 09 '12 at 11:07
  • What option exactly did you set to UTF-8 in php.ini? – deceze Nov 09 '12 at 11:12
  • Have a look what the option is for: http://php.net/manual/en/ini.core.php#ini.default-charset It is independent of and separate from the settings for the `mb_` functions. – deceze Nov 09 '12 at 11:25