0

I have a function that expects a wchar_t array as a parameter.I don't know of a standard library function to make a conversion from char to wchar_t so I wrote a quick dirty function, but I want a reliable solution free from bugs and undefined behaviors. Does the standard library have a function that makes this conversion ?

My code:

wchar_t *ctow(const char *buf, wchar_t *output)
{
    const char ANSI_arr[]    =  "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789`~!@#$%^&*()-_=+[]{}\\|;:'\",<.>/? \t\n\r\f";
    const wchar_t WIDE_arr[] = L"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789`~!@#$%^&*()-_=+[]{}\\|;:'\",<.>/? \t\n\r\f";

    size_t n = 0, len = strlen(ANSI_arr);

    while (*buf) {
        for (size_t x = 0; x < len; x++) {
            if (*buf == ANSI_arr[x]) {
                output[n++] = WIDE_arr[x];
                break;
            }
        }
        buf++;
    }
    output[n] = L'\0';
    return output;
}
Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105

3 Answers3

1

Well, conversion functions are declared in stdlib.h (*). But you must know that for any character in latin1 aka ISO-8859-1 charset the conversion to a wide character is a mere assignation, because character of unicode code below 256 are the latin1 characters.

So if your initial charset is ISO-8859-1, the convertion is simply:

wchar_t *ctow(const char *buf, wchar_t *output) {
 wchar_t cr = output;
    while (*buf) {
        *output++ = *buf++;
    }
    *output = 0;
    return cr;
}

provided caller passed a pointer to an array of size big enough to store all the converted characters.

If you are using any other charset, you will have to use a well known library like icu, or build one by hand, which is simple for single byte charsets (ISO-8859-x serie), more trikier for multibyte ones like UTF8.

But without knowing the charsets you want to be able to process, I cannot say more...

BTW, plain ascii is a subset of ISO-8859-1 charset.

(*) From cplusplus.com

int mbtowc (wchar_t* pwc, const char* pmb, size_t max);

Convert multibyte sequence to wide character The multibyte character pointed by pmb is converted to a value of type wchar_t and stored at the location pointed by pwc. The function returns the length in bytes of the multibyte character.

mbtowc has its own internal shift state, which is altered as necessary only by calls to this function. A call to the function with a null pointer as pmb resets the state (and returns whether multibyte characters are state-dependent).

The behavior of this function depends on the LC_CTYPE category of the selected C locale.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

That isn't a conversion from wchar_t to char. It's a function for destroying data outside of ISO-646. No method in the C library will make that conversion for you. You can look at the ICU4C library. If you are only on Windows, you can look at the relevant functions in the Win32 API (WideCharToMultiByte, etc).

bmargulies
  • 97,814
  • 39
  • 186
  • 310
0

It does in the header wchar.h. It is called btowc:

The btowc function returns WEOF if c has the value EOF or if (unsigned char)c does not constitute a valid single-byte character in the initial shift state. Otherwise, it returns the wide character representation of that character.

2501
  • 25,460
  • 4
  • 47
  • 87