Is their a way to get the next full character in a multibyte string for example "z\u00df\u6c34\U0001d10b" or "zß水" would be represented as 4 characters excluding null termination in a widestring but maybe 9 characters in a multibyte string. I was using the below code to convert to and from string, since I used widestirng internally, but their seems to be subtle issues if the proper length is not given for the __wideToString even if the length is larger than it needs to be. I have also realized that I can probably skip the whole conversion to and from wstring, by using only string, if I can simply get how many characters in the multibyte string makes up the next full character. So say in string u8"u6c34\U0001d10b" which may be stored in 6 characters I would only want the next 2 which would be "水". Can anyone guide me in solving this issue?
I have been having this unicode type issue for a while now and their doesn't seem to be a lot of information on how it's handled in C++, save for third party solutions, which I am trying to avoid.
static
std::string __wideToString(const std::wstring & ws){
if(ws.empty()){throw std::invalid_argument("Wide string must have length >= 1");}
std::setlocale(LC_ALL, "");
size_t length = sizeof(wchar_t)*ws.length();
std::string str(length,' ');
if((length=wcstombs(&str[0], ws.c_str(), length))==size_t(-1)){//return -1 on invalid conversion
throw std::length_error("Conversion Error Invalid Wide Character");
}
str.resize(length); // Shrink to fit.
return str;
}
static
std::wstring __stringToWide(const std::string & str){
if(str.empty()){throw std::invalid_argument("String must have length >= 1");}
std::setlocale(LC_ALL, "");
size_t length = str.length();
std::wstring ws(length, L' '); // Overestimate number of code points.
if((length=mbstowcs(&ws[0], str.c_str(), length))==size_t(-1)){//return -1 on invalid conversion
throw std::length_error("Conversion Error Invalid Multibyte Character");
}
ws.resize(length); // Shrink to fit.
return ws;
}