1

So I've posted several questions related to making already existing software written in PHP to be updated to support unicode / utf8. One of the solutions is to override PHP's default string functions with PHP's mb_string functions. However, I see a lot of people talking about negative consequences, yet no one really elaborates on them. Can someone please explain what these negative consequences are?

Why is it "bad" to override PHP's default string functions with its mb_string functions? It's after all much simpler than replacing all those functions with their corresponding mb_ functions manually. So what am I missing? What are these negative consequences?

J Johnson
  • 168
  • 3
  • 11
  • Duplicate http://stackoverflow.com/questions/12045940/php-string-functions-vs-mbstring-functions but the other one's aren't particularly good answers. – Danack Mar 24 '13 at 11:09

3 Answers3

0

It's bad to override them because if some other developer comes and works on this code then it might do something that he wasn't expecting. It's always good to use the default functions as they were intended.

Ryan Knopp
  • 582
  • 1
  • 4
  • 12
0

I think mb_* family function are heavier as they also perform unicode test as well even of simple ascii string. So on big scale they will slow down your application speed. (May not be on much significance, but somehow definitely.)

kuldeep.kamboj
  • 2,566
  • 3
  • 26
  • 63
0

I'll try to elaborate.

Overloading the standard string functions with mb_* will have dire consequences for anything reading and dealing with binary files, or binary data in general. If you overload the standard function, then suddenly strlen($binData) is bound to return the wrong length at some point.

Why?

Imagine the binary data contains a byte with the value in the ranges 0xC0-0xDF, 0xE0-0xEF or 0xF0-0xF7. Those are Unicode start bytes, and now the overloaded strlen will count the following characters as 1 byte, rather than the 2, 3, and 4 they should have been respectively.

And the main problem is that mbstring.func_overload is global. It doesn't just affect your own script, but all scripts, and any frameworks or libraries they may use.

When asked, should I enable mbstring.func_overload. The answer is always, and SHOULD always be a resounding NO.

You are royally screwed if you use it, and you will spend countless hours hunting bugs. Bugs that may very well be unfixable.

Well, you CAN call mb_strlen($string, 'latin1') to get it to behave, but it still contains an overhead. strlen uses the fact that php strings are like Java strings; they know their own length. mb_strlen parses the string to count the bytes.

A.Grandt
  • 2,242
  • 2
  • 20
  • 21