1

I am trying to convert a string of characters to Upper case by passing a pointer into a toUpper method that I have created. The logic seems to be fine but I get a weird output like ìëïà. Any ideas where I have gone wrong here?

#include <iostream>
#include <string.h> 

using namespace std;

void toUpper(char *);

int main()
{
    char name[80];
    char *namePtr = name;

    cout << "Enter a name :";
    cin >> name;

    toUpper(namePtr);
    cout << "The string in Upper Case is: " << name << endl;


}

void toUpper(char *p)
{

    int asciiValue; 

    // Loop through each char in the string
    for(int i = 0 ; i < strlen(p); i++)
    {
        
        asciiValue = (int) p[i];

        if(asciiValue >= 97 && asciiValue <= 122)
        {
            asciiValue = asciiValue + 32;
            p[i] = asciiValue;
        }
    }

}



user4581301
  • 33,082
  • 7
  • 33
  • 54
Clancinio
  • 734
  • 6
  • 21
  • 40
  • 1
    Note that you can use literals like `'a'` and `'z'` instead of `97` and `122` to avoid magic numbers. Also note that this algorithm doesn't always work. C++ does not guarantee that the character encoding will be ASCII or even that the letters of the alphabet will be encoded with sequential values. – François Andrieux Sep 17 '20 at 18:23
  • 2
    Which "uninialized memory", exactly, are you referring to, @tarick? And what do you mean by "int size block of memory"? There is nothing of that sort here, the bug is something else. – Sam Varshavchik Sep 17 '20 at 18:23
  • Substract 32, not add. – Manuel Sep 17 '20 at 18:25
  • 1
    This is a fairly simple typo, but where you "have gone wrong here" is not using a debugger, which allows you to run your program one line at a time and see exactly what's happening. Have you ran this code in your debugger and observe how this algorithm turns out to be, one character at a time, yet? If not, why not? – Sam Varshavchik Sep 17 '20 at 18:25
  • 2
    1) Use `std::string`. 2) Look up `std::tolower`. 3) Use `std::transform(s.begin(), s.end(), s.begin(), tolower)` – Thomas Matthews Sep 17 '20 at 18:31
  • @ThomasMatthews should be an answer, no? I don't think the obvious should be excluded just because it appears obviosu that OP is carrying out an exercise. OP has gone wrong already when they did not use `std::string` – 463035818_is_not_an_ai Sep 17 '20 at 18:34
  • Since the code in question is entirely C, here’s a (almost entirely) C solution: `for ( ; *p; ++p) *p = std::toupper(*p);` – Pete Becker Sep 17 '20 at 18:46

5 Answers5

3

Your problem boils down to bad magic numbers, which makes it nearly impossible to tell even from a close look because they're magic numbers!

Instead, I'd use character literals to make things obvious:

if(asciiValue >= 'a' && asciiValue <= 'z')
{
    asciiValue = asciiValue + ('a' - 'A');
    p[i] = asciiValue;
}

Now it should be pretty apparent that you're adding the wrong value! It should instead be:

asciiValue = asciiValue + ('A' - 'a');
scohe001
  • 15,110
  • 2
  • 31
  • 51
  • Thanks. I feel stupid. I've been looking at his for hours. I thought I made a mistake with the pointer. – Clancinio Sep 17 '20 at 18:54
  • @Clan which is exactly why I harped on your magic numbers. Throwing them around is almost always a recipe for pull-your-hair-out bugs.You know for next time! :) – scohe001 Sep 17 '20 at 18:56
  • @scohe001 nice one. FUll in the spirit of K&R. But are you aware that this doesn't work for and an ISO 8859-3 encoding, with multiple ranges of lowercase letters, and different offsets between upper and lower case? The difference between 'Ŭ' and 'ŭ' is for example -32 but between 'Ż' and 'ż' it's -16 . ANd I don not even mention unicode ;-) – Christophe Sep 17 '20 at 19:07
  • @Chris every time I post a character arithmetic answer I get one of these comments. I agree with you entirely and this wouldn't get past a code review with production code. That being said, I've tried to show OP their error without getting lost in the weeds (and their variable name is `asciiValue`, so we're probably fine here). I could add a part on the "right" way to do this, but at that point I'd just be giving a lesser version of the already great answers below. – scohe001 Sep 17 '20 at 19:55
  • @scohe001 I'm sorry if I did upset you. This was not the intent. But I'm using daily a couple of languages that use some weird characters that give different results with different encodings. Sometimes these characters are not transformed a they should. Sometimes they are even rejected because not recognised as printable chars. I guess that all the people who suffer daily or even get their name mutilated this way, are exactly the people who write you such comments ;-) So it's really not theoretic and therefore the isxxxx() toxxx() really deserve to get better known – Christophe Sep 17 '20 at 21:19
3

Your code is non portable

This is non portable code. It is sure to work only with ASCII encoding. Nevertheless, here the corresponding non-portable solution:

       asciiValue = asciiValue - 32;   // - just move in the other direction

How to do better?

Here some problems for your current code:

  • The difference between upper and lowercase letters is not always 32. In EBCDIC, for example, the difference between upper and lowercase ordinary letters is +64 instead of -32.
  • The boundaries of lowercase might be different as well.
  • With foreign languages using non ASCII locale, you might have special characters that are in a different range than the normal letters (see for example ISO-8859-1) where you also have lowercase in the range 224 and 25' but with one exception.
  • In some encodings, you have even different rules for different sets of lowercases. Take ISO 8859-3. The difference between 'Ŭ' and 'ŭ' is -32 but between 'Ż' and 'ż' it's -16 .
  • Finally, chars are not guaranteed to be unsigned. Your whole comparison logic might completely fail, if you use an ISO-8859-1 encoding compined with a compiler that manages chars as signed chars.

A much safer approach is therefore to to use: isupper() and toupper() which takes into account the locale.

As a side effect, this will even facilitate migration to a full unicode compliant code, using the templated version of these functions or the wide version.

And why don't you use real strings?

Your code is at risk of a buffer overflow, if someone types a name of 80 or more characters. You'd need to make sure that cin does not take more characters than allowed. But instead of telling you how to do this, I suggest to use the safer std::string instead:

void toUpper(string &s)
{
    for(auto &p:s)              // Loop through each char in the string
        if (islower(p))         
            p =toupper(p);
}

int main()
{
    string name;
    cout << "Enter a name :";
    cin >> name;
    toUpper(name);
    cout << "The string in Upper Case is: " << name << endl;
}

online demo

Christophe
  • 68,716
  • 7
  • 72
  • 138
  • 1
    Even on EBCDIC systems, the difference between upper and lowercase letters is constant for every pair of letters. Using `+ ('A' - 'a')` would also provide a simple portable solution. – scohe001 Sep 17 '20 at 18:30
  • @scohe001 Maybe, but as soon as you're no longer in ASCII and use some ISO based locale, this will not work anymode. Take 'é' and 'É' for example; – Christophe Sep 17 '20 at 18:32
1

asciiValue = asciiValue - 32;

Minus instead of plus Example: ASCII value of "a" is 97

97 - 32 is 65 which is the ASCII value of uppercase A

Juned Khan
  • 124
  • 1
  • 9
1

It seems in this if statement

    if(asciiValue >= 97 && asciiValue <= 122)
    {
        asciiValue = asciiValue + 32;
        p[i] = asciiValue;
    }

you are checking that the current symbol is a lower-case ASCII symbol.

But lower-case ASCII symbols have higher codes than upper-case ASCII symbols.

So instead of adding the magic number 32

        asciiValue = asciiValue + 32

you have to subtract it

        asciiValue = asciiValue - 32

For example the lower case ASCII symbol 'a' has the code 97 while the upper case.symbol 'A' has the code 65.

But in any case your approach with magic numbers is bad because for example it will not work with EBCDIC symbol representations.

Also calling the function strlen in this case is inefficient.

And it is much better when the function returns pointer to the converted string.

The function can be declared and implemented the following way

#include <cctype>

//...

char * toUpper( char *s )
{
    for ( char *p = s; *p; ++p )
    {
        if ( std::islower( static_cast<unsigned char>( *p ) ) )
        {
            *p = std::toupper( static_cast<unsigned char>( *p ) );
        }
    }

    return s;
}
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • If you're going to use `cctype`'s `touper()`, is the `if` statement even necessary? Won't `toupper()` just return the unmodified character if it can't uppercase it? – scohe001 Sep 17 '20 at 18:53
  • @scohe001 I think there is no difference because if remove the if statement then you need always do assignments. – Vlad from Moscow Sep 17 '20 at 18:56
0

A more portable C++ solution is to use std::transform to transform the string to lower case:

std::string shouting = "AM I SHOUTING";
std::transform(shouting.begin(), shouting.end(), shouting.begin(), tolower);
std::cout << shouting << "\n";

This solution does not rely on ASCII encoding and will work with code sets where std::tolower is valid.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • Side note: `tolower` can sometimes screw up if `char` is signed. [See the notes linked here](https://en.cppreference.com/w/cpp/string/byte/tolower#Notes). – user4581301 Sep 17 '20 at 19:07
  • This one is very elegant! What albout replacing `tolower` with `[](auto c){return tolower(c,std::locale());}` ? – Christophe Sep 17 '20 at 21:48
  • I haven't tried with a lambda. Usually `std::lower` goes through the locale system. – Thomas Matthews Sep 17 '20 at 21:54