-1

Which C function can convert À, É to lower à, è?

I tried tolower() and towlower(), but both do not work.

smac89
  • 39,374
  • 15
  • 132
  • 179
L. Feng
  • 19
  • 1
  • 4

2 Answers2

2

You can use towlower function:

/* towlower example */
#include <stdio.h>
#include <wctype.h>
#include <wchar.h>
#include <stddef.h>
#include <locale.h>

int main () {

    setlocale(LC_CTYPE, "");
    int i=0;
    wchar_t str[] = L"À TÉst String.\n";
    wchar_t c;
    while (str[i]) {
        c = str[i];
        putwchar (towlower(c));
        i++;
    }
    return 0;
}

Output is:

à tést string.

> A C program inherits its locale environment variables when it starts
> up. This happens automatically. However, these variables do not
> automatically control the locale used by the library functions,
> because ANSI C says that all programs start by default in the standard
> `C' locale. To use the locales specified by the environment, you must
> call setlocale. Call it as follows:
>
> setlocale (LC_ALL, "");

"" The empty name says to select a locale based on environment variables.

smac89
  • 39,374
  • 15
  • 132
  • 179
  • Notice the `w` in the name of this function! This is one of the "alternative functions" that I was speaking-of in my Answer. – Mike Robinson Jul 18 '16 at 03:57
  • The important part of the answer is setlocale. Having adjusted the locale, even tolower might work well, provided the strings are in the correct encoding (for French probably ISO-8559-15, on Windows possibly CP-1252). Unfortunately, we do not know there the strings to be change come from (file, console, source code - the latter: joining Ed's comment...). With utf8, we are out anyway... – Aconcagua Jul 18 '16 at 04:52
  • For utf-8 encoded strings, I would have a look at the [ICU](http://site.icu-project.org/) library. – Aconcagua Jul 18 '16 at 04:56
  • Hi khredos, thanks, it works after add 'setlocale()'. – L. Feng Jul 19 '16 at 04:13
1

The actual problem that you are facing here *(despite the preceding "Answers"), is that you have a Unicode string. *(Or, at the very least, some kind of DBCS = "Double-Byte Character Set.")

The standard functions of the "C" language were devised "in a much-earlier, much-simpler time," in which the only language-representation that needed to be considered was ASCII, which assigned "every character that needed to be represented" into a set of 127 possible values. Nowhere in this picture were any "diacritical markings" such as these. In those simple times, "1 byte = 1 character."

In order to represent "real human(!)-language characters," it was necessary to adopt a far more flexible encoding format which might assign anywhere from 1 to 4 bytes to a single "character." (And, mind you, a consensus about "exactly how to do this" did not happen overnight!) In any case, the "original" library-routines that you are now using here are not "Unicode aware." (They never were designed to be, and they cannot now be retrofitted ...) Instead, alternate functions must be used.

Here's a good external web-page which gives a pretty good summary of the various issues that need to be considered when using C and C++:

http://www.cprogramming.com/tutorial/unicode.html

--- Edit: When I said, "a consensus about exactly how to do this did not happen overnight," my comment was intended to have potentially far-reaching(!) implications. "Why is it necessary, even today, to say "encoding=UTF-8"? This is why. "A single interpretation of 'how to interpret a multi-national sequence-of-bytes'" never did develop, and the "C" language, especially, "took it in the chin." There are more-than-one complete sets of library-functions in today's "C" runtime that you might need to use, in order to correctly handle your data.

Mike Robinson
  • 8,490
  • 5
  • 28
  • 41