I want to convert Windows-1252 text to UTF-8. The Windows-1252 text can contain invalid bytes (e.g. 0x90). I need to replace invalid bytes with a replacement character ('?').
Example: "a[0x90]b" (Windows-1252) -> "a?b" (UTF-8)
I tried with 'UTF-8//TRANSLIT' but iconv() stopps with an error ('Invalid or incomplete multibyte or wide character'). 'UTF-8//IGNORE' does work but removing invalid chars is not what i want.
I use iconv from the standard C library in C++.
Mabye someone can give me a hint.
used code:
//char* res=std::setlocale(LC_ALL, "de_DE"); //no effect
const iconv_t iconv_handle = iconv_open("UTF-8//TRANSLIT", "WINDOWS-1252");
assert(iconv_handle != (iconv_t)-1);
const char input_text[] = "a\x90\b"; //'a', 'invalid byte', 'b'
std::array<char, _countof(input_text) * 4> utf8_result_buffer{};
char* in = (char*)input_text;
char* out = (char*)utf8_result_buffer.data();
size_t srclen = strlen(input_text);
size_t outbytesleft = utf8_result_buffer.size();
const size_t iconv_res = iconv(iconv_handle, &in, &srclen, &out, &outbytesleft);
if (iconv_res == (size_t)-1)
{
perror("iconv");
}
//result: utf8_result_buffer: "a" and 'Invalid or incomplete multibyte or wide character' error
iconv_close(iconv_handle);