0

I am wondering how the toupper() function in C works. I am trying it out in the code below but I'm definitely doing something wrong. The code compiles, but the arguments passed into toupper() are not being capitalized...

char **copyArgs(int argc, char **argv) {
    char **a = malloc(sizeof(char *) * (argc));

    int i;
    for(i = 0; i < argc; i++) {
        int size = strlen(argv[i]);
        a[i] = malloc(sizeof(char) * (size + 1));
        strcpy(a[i], argv[i]);
        a[i] = toupper(a[i]);
    }
    return a;
}

If I test this with "one two" it results in "one two", not "ONE TWO". Any advice is appreciated.

dtb
  • 213,145
  • 36
  • 401
  • 431
user1889966
  • 165
  • 1
  • 4
  • 11
  • 3
    toupper works on char, not char* – John3136 Feb 24 '13 at 23:23
  • 2
    `toupper` converts a single character, not a string. I wonder why this doesn't actually screw up your pointers. – Niklas B. Feb 24 '13 at 23:24
  • The reason why the pointers survive this is that out of pure luck, the addresses don't evaluate to ASCII symbols for lowercase letters and are therefore not altered... – starturtle Sep 21 '16 at 13:49

1 Answers1

5

toupper converts a single letter to uppercase. In your case, you are passing a pointer to it instead of a char thanks to C's forgiveness in implicit conversions, so it's obvious that it doesn't work correctly. Probably you are getting an "implicit pointer to integer conversion without a cast" warning: this is a strong sign that you are doing something very wrong.

The whole thing doesn't blow up just because on your platform int is as big as a pointer (or, at least, big enough for those pointers you are using); toupper tries to interpret that int as a character, finds out that it's non-alphabetic and returns it unmodified. That's sheer luck, on other platforms your program would probably crash, because of truncation in the pointer to int conversion, and because the behavior of toupper on integers outside the unsigned char range (plus EOF) is undefined.

To convert a whole string to uppercase, you have to iterate over all its chars and call toupper on each of them. You can easily write a function that does this:

void strtoupper(char *str)
{
    while(toupper((unsigned char)*str++))
        ;
}

Notice the unsigned char cast - all C functions dealing with character categorization and conversion require an int that is either EOF (which is left intact) or is the value of an unsigned char. The reason is sad and complex, and I already detailed it in another answer.

Still, it's worth noting that toupper by design cannot work reliably with multibyte character encodings (such as UTF-8), so it has no real place in modern text processing (as in general most of the C locale facilities, which were (badly) designed in another era).

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299