As the manual page for mb_detect_encoding
says:
Automatic detection of the intended character encoding can never be entirely reliable; without some additional information, it is similar to decoding an encrypted string without the key. It is always preferable to use an indication of character encoding stored or transmitted with the data, such as a "Content-Type" HTTP header.
Part of the way the function combats this is to require you to provide a list of candidate encodings. If none is provided directly to the function, they are taken from a global configuration state (see mb_detect_order).
For instance, taking the string you've provided, and using $encode = mb_detect_encoding($sjis_str, 'EUC-JP,UTF-8,SJIS');
returns 'SJIS', and the conversion appears to proceed correctly, as demonstrated here: https://3v4l.org/KKf2h
The shorter the input, and the more candidates encodings you list, the more likely it is that mb_detect_encoding
will guess wrong - the string may be equally valid in multiple encodings.
It's also worth noting that if the string is not valid in any of the encodings you list, mb_detect_encoding
will return false
, so if you are using it to process unknown strings, you should check if ( $encode === false )
and add some appropriate error handling.