how to read multibyte chars without knowing their code page on mac using c++

Asked Aug 21 '13 at 09:25

Active Aug 22 '13 at 09:51

Viewed 127 times

I'm trying to read some multibyte chars on mac, but the the their code page is unknown. Is there a way to read them and convert them to utf-8? the locale and region can be used, is there a way to connect them to the corresponding code page info? for example, I want to translate '\xbf\xa7' which represent chinese charactor "咖".

Now I'm using iconv to convert the charactors, but it requires the code info: my code is as below:

char src=[] ="\xbf\xa7";
char dst[100];
size_t srclen=3;
size_t dstlen=6;
char *pIn=src;
char *pOut =(char*)dst;

iconv_t conv= iconv_open("UTF-8","GBK");
iconv(conv,&pIn,&srclen, &pOut,&dstlen);
iconv_close(conv);
fprintf(stderr,"out: %s\n",dst);

Thank you!

Update: Is there a way to determine the encoding of the system? so I can use it as from code for iconv_open?

edited Aug 22 '13 at 09:51

asked Aug 21 '13 at 09:25

xiaohei

in multibyte charactesr, how is any software supposed to know what character you mean by a certain combination of bytes unless you specify the codepage? There is no way around specifying a codepage. There might be some heuristic to determine the codepage (e.g. depending on the frequency with which certain values appear), but that can never be 100% accurate – codeling Aug 21 '13 at 09:30
It's impossible to do for certain in general. – R. Martinho Fernandes Aug 21 '13 at 11:54
I agree. Sorry for not being specific, on windows, the function MultiByteToWideChar can take CP_ACP as its code page and use the correct one. Is there a similar method on mac? Thank you! – xiaohei Aug 22 '13 at 01:34
I believe the default encoding on OS X is UTF-8 regardless of locale. – bobince Aug 22 '13 at 14:00

how to read multibyte chars without knowing their code page on mac using c++

0 Answers0