Your file claims to be encoded as UTF-8, as evident by the 1st 3 bytes EF BB BF
, which are the UTF-8 BOM.
In Delphi 2009+, String
is a UTF-16 encoded Unicode string, so LoadFromFile()
will see the BOM and try to decode the file bytes from UTF-8 to Unicode, then encode that Unicode data to UTF-16 in memory.
However, after the BOM, the next 3 bytes 5A 65 20
are proper UTF-8, but the rest of your file after that is NOT proper UTF-8. That is why you are getting the exception.
The correct byte sequence for the characters you have shown should look like the following:
EF BB BF 5A 65 20 F0 9F 87 AB F0 9F 87 AE
But your file contains these bytes instead:
EF BB BF 5A 65 20 ED A0 BC ED B7 AB ED A0 BC ED B7 AE
As you can see, the byte sequence F0 9F 87 AB F0 9F
in the correct file has been mis-encoded as ED A0 BC ED B7 AB ED A0 BC ED
in your bad file.
When processed as UTF-8, the good file decodes to the following Unicode codepoint sequence:
U+005A LATIN CAPITAL LETTER Z
U+0065 LATIN SMALL LETTER E
U+0020 SPACE
U+1F1EB REGIONAL INDICATOR SYMBOL LETTER F
U+1F1EE REGIONAL INDICATOR SYMBOL LETTER I
Whereas your bad file decodes to the following sequence instead:
U+005A LATIN CAPITAL LETTER Z
U+0065 LATIN SMALL LETTER E
U+0020 SPACE
U+D83C HIGH SURROGATE - invalid!
U+DDEB LOW SURROGATE - invalid!
U+D83C HIGH SURROGATE - invalid!
U+DDEE LOW SURROGATE - invalid!
Now, it happens that D83C DDEB
D83C DDEE
is the proper UTF-16 encoded form of Unicode codepoints U+1F1EB
U+1F1EE
. This means that your original Unicode text was encoded to UTF-16 first, then the individual UTF-16 code units where incorrectly treated as-is as Unicode codepoints (which they are not) and were then encoded accordingly to UTF-8, thus producing your bad file.
If this is the only file affected, then you can simply replace its bytes with the bytes shown above. But if this is part of a larger encoding process that is producing badly encoded UTF-8 files that you can't load afterwards, then you need to figure out where that incorrect UTF-16 handling is occurring and fix that issue.