I need to be able to detect a string's encoding but mb_detect_encoding isn't working.
I obtain the string from a file (file_get_contents
) and I know the file that was giving me trouble was in UTF-16 LE. However, from the docs what I understand is that detecting this encoding is not possible (mb_detect_order : "For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always.").
How can I obtain a string's encoding in a trustworthy way in PHP? Any possible encoding?
I lost multiple hours trying to solve this but I found no good resource. I would like to be able to automate this so if my file changes its encoding, my program will be able to handle it (I am obtaining the file from another website).
I've tried this with no success, it tells me UTF-8:
mb_detect_encoding($proper_string, 'UTF-16LE,UCS-2,UTF-8,ASCII', true)
I've also tried this:
echo 'mb_check_encoding($fileContents, \'UTF-8\'): ' . mb_check_encoding($fileContents, 'UTF-8') . "\n";
//true
echo 'mb_check_encoding($fileContents, \'UTF-16\'): ' . mb_check_encoding($fileContents, 'UTF-16') . "\n";
//true
echo 'mb_check_encoding($fileContents, \'UTF-16LE\'): ' . mb_check_encoding($fileContents, 'UTF-16LE') . "\n";
//true
echo 'mb_check_encoding($fileContents, \'UCS-2\'): ' . mb_check_encoding($fileContents, 'UCS-2') . "\n";
//true
echo 'mb_check_encoding($fileContents, \'ISO-8859-1\'): ' . mb_check_encoding($fileContents, 'ISO-8859-1') . "\n";
//true