6

The mbstring PHP module has a strict_detection setting, documented here. Unfortunately, the manual is completely useless; it only says that this option "enables the strict encoding detection".

I did a few tests and could not find how any of the mbstring functions are affected by this. mb_check_encoding() and mb_detect_encoding() give exactly the same result for both valid and invalid UTF-8 input.

(edit:) The mbstring.strict_detection option was added in PHP 5.1.2.

Shi
  • 4,178
  • 1
  • 26
  • 31
Zilk
  • 8,917
  • 7
  • 36
  • 44

1 Answers1

6

Without the strict parameter being set, the encoding detection is faster but will not be as accurate. For example, if you had a UTF-8 string with partial UTF-8 sequence like this:

$s = "H\xC3\xA9ll\xC3";
$encoding = mb_detect_encoding($s, mb_detect_order(), false);

The result of the mb_detect_encoding call would still be "UTF-8" even though it's not valid UTF-8 (the last character is incomplete).

But if you set the strict parameter to true...

$s = "H\xC3\xA9ll\xC3";
$encoding = mb_detect_encoding($s, mb_detect_order(), true);

It would perform a more thorough check, and the result of that call would be FALSE.

James Holderness
  • 22,721
  • 2
  • 40
  • 52
  • That's right, but the setting of `mbstring.strict_detection` doesn't affect that behavior (not even the default value of the $strict parameter). – Zilk Jul 30 '13 at 22:40
  • It works for me. With `mbstring.strict_detection = On` the default value for the *strict* parameter is true. Note that this is only available since PHP 5.1.2. – James Holderness Jul 30 '13 at 22:55
  • Ah, you're right, it *does* affect `mb_detect_encoding()` if the third parameter is missing. I had an error in my tests; only `mb_check_encoding()` and `mb_convert_encoding()` are unaffected. Thank you. – Zilk Jul 30 '13 at 23:18