Is it possible, prior to converting a string from a charset to another, to know whether this conversion will be lossless?
If I try to convert an UTF-8 string to latin1, for example, the chars that can't be converted are replaced by ?
. Checking for ?
in the result string to find out if the conversion was lossless is obviously not a choice.
The only solution I can see right now is to convert back to the original charset, and compare to the original string:
function canBeSafelyConverted($string, $fromEncoding, $toEncoding)
{
$encoded = mb_convert_encoding($string, $toEncoding, $fromEncoding);
$decoded = mb_convert_encoding($encoded, $fromEncoding, $toEncoding);
return $decoded == $string;
}
This is just a quick&dirty one though, that may come with unexpected behaviours at times, and I guess there might be a cleaner way to do this with mbstring, iconv, or any other library.