0
int main()
{
    char str[200] = {0};
    char out[500] = {0};

    str[0]=0x00; str[1]=0x52; str[2]=0x00; str[3]=0x65; str[4]=0x00; str[5]=0x73; str[6]= 0x00; str[7]=0x74;

    for(int i=0;i<sizeof(str);i++)
    cout<<"-"<<str[i];
    changeCharEncoding("UCS-2","ISO8859-1",str,out,sizeof(out));
    cout<<"\noutput : "<<out;
    for(int i=0;i<sizeof(out);i++)
    cout<<":"<<out[i];
}

//encoding function
int changeCharEncoding(const char *from_charset, const char *to_charset, const char *input, char *output, int out_size)
{
    size_t input_len = 8;
    size_t output_len = out_size;
    iconv_t l_cd;
    if ((l_cd = iconv_open (to_charset, from_charset)) == (iconv_t) -1)
    {
            return -1;
    }
    int rc = iconv(l_cd, (char **)&input, &input_len, (char **)&output, &output_len);
    if (rc == -1)
    {
            iconv_close(l_cd);
            return -2;
    }
    else
    {
            iconv_close(l_cd);
    }
}

Please suggest me a method to convert 16 bit data to 8 bit.I have tried it using iconv. Also suggest me if there is something else to do the same.

  • By it's very definition this is going to be difficult. What are you going to do with your overflows? If you're converting character sets you'll need to be very sure of your source and target encodings, and you'll need a mechanism for handling out-of-set conversions. –  Oct 11 '13 at 19:45
  • 1
    What do you mean with "data"? And what do you mean with "convert", even? Are you trying to do ASCII representations of Unicode text? – Nikos C. Oct 11 '13 at 19:46
  • Also, C **XOR** C++? Pick exactly one. –  Oct 11 '13 at 19:48
  • basically i have a string which is encoded in UCS2 and i need to convert it into utf-8, so for that i tried to develop the sample program and found that it is not working – user2843171 Oct 11 '13 at 19:48
  • the above program should print "Rest" as output – user2843171 Oct 11 '13 at 19:49
  • So, what *does* it output if it doesn't output "Rest"? and you go to much trouble to return error codes from changeCharEncoding, but you then ignore them completely...??? – Roddy Oct 11 '13 at 20:23
  • Are you sure your 16-bit chars should be encoded hi-order byte first? Try str[0] = 0x52, str[1] = 0, etc... – Roddy Oct 11 '13 at 20:33

3 Answers3

1

It looks like you are trying to convert between UTF-16 and UTF-8 encoding: Try changing your call of changeCharEncoding() to:

changeCharEncoding("UTF-16","UTF-8",str,out,sizeof(out));

The resulting UTF-8 output should be

刀攀猀琀

On a sidenote: there are several things in your code that you should consider improving. For example both changeCharEncoding and main are declared to return an int whereas your implementation does not.

Pankrates
  • 3,074
  • 1
  • 22
  • 28
  • I think he wants it to be "Rest". Sounds like a BOM issue! – Roddy Oct 11 '13 at 20:28
  • no the `UTF-16` input is `Rest` which when converted to `UTF-8` encoding yields the result I showed – Pankrates Oct 11 '13 at 20:29
  • Well, that's wrong, then. "Rest" in UTF-16 (or UCS-2) should be "Rest" in UTF-8 (or, ISO8859-1 for that matter) – Roddy Oct 11 '13 at 20:31
  • OK you are probably right, that means his `changeCharEncoding` function is flawed then, as I have not checked its accuracy. – Pankrates Oct 11 '13 at 20:37
  • @Roddy: Pankrates correctly changes the input, and that input is *not* "Rest" -- it's "\u5200\u6500\u7300\u7400". The first character of Pankrates' output confrims this: [codepoint=5200](http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=5200) – Jongware Oct 11 '13 at 20:43
  • @Jongware - You are assuming one "little-endianness" for the string. The OP is assuming "big-endianness". UCS-2 is by definition always big-endian, (but in practice that's not always the case. http://en.wikipedia.org/wiki/UCS-2 ). I expect that by changing "UCS-2" to "UCS-2BE" he'l get the expected answer. – Roddy Oct 11 '13 at 20:54
0

Generally speaking - you cannot convert arbitrary 16 bit data into 8 bit data, you will loose some data

if you're trying to convert encodings - the same rule applies, as you cannot convert some symbols into 8bit ASCII, so they will be lost, for different platforms you can use different functions:

Windows: WideCharToMultiByte

*nix: iconv

Iłya Bursov
  • 23,342
  • 4
  • 33
  • 57
0

I suspect you have an endian-ness problem: Try changing this

changeCharEncoding("UCS-2","ISO8859-1",str,out,sizeof(out));

to this

changeCharEncoding("UCS-2BE","ISO8859-1",str,out,sizeof(out));
Roddy
  • 66,617
  • 42
  • 165
  • 277