0

Code

/*char* to wchar_t* */
wchar_t*strtowstr(char*str){
    iconv_t cd=iconv_open("wchar_t","UTF-8");
    if(cd==(iconv_t)-1){
        return NULL;
    }
    size_t len1=strlen(str),len2=1024;
    wchar_t*wstr=(wchar_t*)malloc((len1+1)*sizeof(wchar_t));
    char*ptr1=str;
    wchar_t*ptr2=wstr;
    if((int)iconv(cd,&ptr1,&len1,(char**)&ptr2,&len2)<0){
        free(wstr);
        iconv_close(cd);
        return NULL;
    }
    *ptr2=L'\0';
    iconv_close(cd);
    return wstr;
}

I use strerror(errno) to get the error message,it says "Arg list too long".
How can I solve it? Thanks to the comments,I change the code above.
I just use the function to read a text file.I think it reports the error because the file is too large.So I want to know how to use iconv for long string.

Pudron
  • 66
  • 1
  • 7
  • 1
    *`E2BIG`: There is not sufficient room at `*outbuf`.* – ikegami Feb 25 '20 at 03:41
  • I think `len2` should be a number of *bytes* – ikegami Feb 25 '20 at 03:43
  • OT: for ease of readability and understanding: 1) please follow the axiom: *only one statement per line and (at most) one variable declaration per statement.* 2) insert a reasonable space: inside parens, inside brackets, inside braces, after commas, after semicolons, around C operators – user3629249 Feb 25 '20 at 03:47
  • There are a LOT of undefined items in the posted code, like: `MAX_STRING`. Please post a [mcve] so we can reproduce the problem and help you debug it. – user3629249 Feb 25 '20 at 03:48
  • I don't see: `strerror(errno)` anywhere in the posted code. Please post the code you actually used. – user3629249 Feb 25 '20 at 03:49
  • @user3629249, In the caller, when the function returns NULL. – ikegami Feb 25 '20 at 03:53
  • OT: regarding: `wchar_t*wstr=(wchar_t*)malloc((len1+1)*sizeof(wchar_t));` 1) in C, the returned type is `void*` which can be assigned to any pointer. Casting just clutters the code and is error prone, 2) always check (!=NULL) the returned value to assure the operation was successful. If not successful, call `perror( "malloc failed" )` to output to `stderr` the error message and the text reason the system thinks the error occurred. – user3629249 Feb 25 '20 at 03:54
  • regarding: `iconv_t cd=iconv_open("wchar_t","UTF-8"); if(cd==(void*)((size_t)-1)){ return NULL; }` this fails to tell the user of the code why it failed. Suggest, before the `return` statement, something similar to: `perror( "iconv_open failed" );` 2) the comparison to `cd` seems incorrect: suggest: `if(cd==(iconv_t)-1)){` – user3629249 Feb 25 '20 at 04:08
  • @ikegami, there are LOTS of C library functions (and the iconv functions) that modify the value in `errno` Therefore, any usage of that value must occur immediately after the function that set that value – user3629249 Feb 25 '20 at 04:18

1 Answers1

1

According to the man page, you get E2BIG when there's insufficient room at *outbuf.

I think the fifth argument should be a number of bytes.

wchar_t *utf8_to_wstr(const char *src) {
    iconv_t cd = iconv_open("wchar_t", "UTF-8");
    if (cd == (iconv_t)-1)
        goto Error1;

    size_t src_len = strlen(src);                 // In bytes, excludes NUL
    size_t dst_len = sizeof(wchar_t) * src_len;   // In bytes, excludes NUL
    size_t dst_size = dst_len + sizeof(wchar_t);  // In bytes, including NUL
    char *buf = malloc(dst_size);
    if (!buf)
        goto Error2;

    char *dst = buf;
    if (iconv(cd, (char**)&src, &src_len, &dst, &dst_len) == (size_t)-1)
        goto Error3;

    *(wchar_t*)dst = L'\0';
    iconv_close(cd);
    return (wchar_t*)buf;

Error3:
    free(buf);
Error2:
    iconv_close(cd);
Error1:
    return NULL;
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • according to the documentation for `libiconv.so` the functions, when they fail, set `errno` Therefore when those functions fail, should call `perror( " failed" );` So the error message and the text reason the system thinks the error occurred are output to `stderr`. This especially helps the user to know that 1) something went wrong and 2) tells the user what the system thinks the cause of the problem was. – user3629249 Feb 25 '20 at 04:12
  • Thank you very much.Does the `input_len` refer to `src_len`? – Pudron Feb 25 '20 at 04:13
  • OT: using `goto` when a simple correction to the program logic is very poor programming practice. – user3629249 Feb 25 '20 at 04:13
  • this answer fails to allow room for the NUL character at the end of the w_char string – user3629249 Feb 25 '20 at 04:15
  • @user3629249, Re "*Therefore when those functions fail, should call perror*", No, the caller should call `perror`+`exit` or whatever is appropriate for it. – ikegami Feb 25 '20 at 04:16
  • @Pudron, Re "*Does the `input_len` refer to `src_len`?*", I just noticed and fixed that. – ikegami Feb 25 '20 at 04:17
  • @user3629249, Re "*this answer fails to allow room for the NUL character at the end of the w_char string*", That's not true. – ikegami Feb 25 '20 at 04:18
  • @ikegami, the function `strlen()` does not return a value that includes the NUL byte at the end of the string. That is why you will often see `+1` added when allocating memory, etc – user3629249 Feb 25 '20 at 04:20
  • @user3629249, Re "*using `goto` when a simple correction to the program logic is very poor programming practice.*", I disagree. `iconv_close(cd)` would then appear four times. More importantly, a nice clean approach used *consistently* across all code is best. There's no point making an *exception* here because it wouldn't add much to deviate. Finally, it keeps the code clean because it helps the reader can focus on the "real" logic, while code interspaced with lots of error handling detracts from that. – ikegami Feb 25 '20 at 04:20
  • @user3629249, Re "*the function strlen() does not return a value that includes the NUL byte at the end of the string.*", I am aware of that, which is why the code already adds `sizeof(wchar_t)` bytes for the NUL – ikegami Feb 25 '20 at 04:21
  • @ikegami, the use of `goto` results in 'spaghetti' code. Therefore, don't use it. If you don't want to repeat a statement or two of code, then call the sub routine that contains those lines of code – user3629249 Feb 25 '20 at 06:59
  • @user3629249, Re "*the use of `goto` results in 'spaghetti' code.*" It *can*, but it's wrong to claim it always does. Here, it's used as `throw`/`catch`, a perfectly acceptable construct. /// Re "*If you don't want to repeat a statement or two of code*", Strawman. I clearly said the main reason it was being used was to improve readability. – ikegami Feb 25 '20 at 07:20
  • Re "*then call the sub routine that contains those lines of code*", You clearly didn't think that one through. Each error handling block is different, so that would not remove any code duplication at all. It also wouldn't address the other issues `goto` is addressing (described above). Having all these tiny functions would hinder readability. In short, that would make everything *worse*. Try your own advice and see for yourself – ikegami Feb 25 '20 at 07:25
  • OT: `throw` `catch` is NOT a C programming concept. – user3629249 Feb 25 '20 at 18:39
  • If you write the flowchart of the code you will be able to easily see the 'spaghetti' – user3629249 Feb 25 '20 at 18:40
  • @user3629249 Re "*`throw` `catch` is NOT a C programming concept.*", Exactly. That's why `goto` is needed to implement that flow – ikegami Feb 25 '20 at 19:00
  • `goto` is NEVER needed – user3629249 Feb 25 '20 at 19:02
  • @user3629249 The flow chart looks awesome. It looks identical to nested ifs, without the excessive indenting. You keep proving my points :) – ikegami Feb 25 '20 at 19:02
  • @user3629249 Of course it isn't needed. It just makes things better sometimes, like you just proved. Same goes for `break`, `continue` non-terminal `return` and `exit`. (And `throw` in languages that have it.) All of those actually break the flowchart flow unlike what I used. – ikegami Feb 25 '20 at 19:05