I managed to write a working example.
When c1
is '\xce'
and c2
is '\xb8'
, the result is θ
.
It turns out that I have to call setlocale
before using mbstowcs
.
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
int main()
{
char* localeInfo = setlocale(LC_ALL, "en_US.utf8");
printf("Locale information set to %s\n", localeInfo);
const char c1 = '\xce';
const char c2 = '\xb8';
int byteCount = 2;
char* mbS = (char*) malloc(byteCount + 1);
mbS[0] = c1;
mbS[1] = c2;
mbS[byteCount] = 0; //null terminator
printf("Directly using printf: %s\n", mbS);
int requiredSize = mbstowcs(NULL, mbS, 0);
printf("Output size including null terminator is %d\n\n", requiredSize +1);
wchar_t *wideOutput = (wchar_t *)malloc( (requiredSize +1) * sizeof( wchar_t ));
int len = mbstowcs(wideOutput , mbS, requiredSize +1 );
if(len == -1){
printf("Failed conversion!");
}else{
printf("Converted %d character(s). Result: %ls\n", len, wideOutput );
}
return 0;
}
Output:
Locale information set to en_US.utf8
Directly using printf: θ
Output size including null terminator is 2
Converted 1 character(s). Result: θ
For 3 or 4 byte utf8 characters, one can use a similar approach.