lowercase to uppercase in C++

Question

note that I am not asking what are the methods to convert lowercase letters to UPPERCASE letters in C++ but instead, I want to know which of these two methods in the codes below (Upper1 and Upper2) are better than the other one and what's the reason, programming wise.

#include <string>
#include <iostream>
#include <locale> //Upper2 requires this module

using namespace std;

void Upper1(string &inputStr);
void Upper2(string &inputStr);

int main(){

    string test1 = "ABcdefgHIjklmno3434dfsdf3434PQRStuvwxyz";
    string test2 = "ABcdefgHIjklmnoPQRStuvwxyz";

    Upper1(test1);
    cout << endl << endl << "test1 (Upper1): ";
    for (int i = 0; i < test1.length(); i++){
        cout << test1[i] << " ";
    }


    Upper2(test2);
    cout << endl << endl << "test2 (Upper2): ";
    for (int i = 0; i < test2.length(); i++){
        cout << test2[i] << " ";
    }

    return 0;
}

void Upper1(string &test1){

    for (int i = 0; i < 27; i++){ 
        if (test1[i] > 96 && test1[i] <123){ //convert only those of lowercase letters
            test1[i] = (char)(test1[i]-(char)32);
        }

    }
}

void Upper2(string &test2){

    locale loc;

    for (size_t i=0; i<test2.length(); ++i)
        test2[i] = toupper(test2[i],loc);
}

score 3 · Accepted Answer · answered Dec 31 '12 at 13:37

The main difference between the two proposed solutions is that Upper2 sort of works, regardless of the platform; Upper1 makes assumptions concerning the encoding, and doesn't work on any modern platform that I know of. (It assumes ASCII, and ASCII is, for all intents and purposes, dead.)

Of course, neither really works, for two simple reasons: the first is that most modern machines use a multibyte encoding (UTF-8), so you cannot convert a string from lower to upper one byte at a time. The second is because there is not, generally speaking, a one to one relationship of lower to upper: the classical example is 'ß', whose upper case equivalent is the two character string "SS". Still, for a somewhat simplistic definition of the function, and a single byte encoding like ISO 8859-1 (probably the most widely used in the recent past), Upper2 will do a reasonably good job (provided there is no 'ß' in the input), adequate for many uses, whereas Upper1 will fail lamentably.

you pointed out some good points, thank you! – Raf Dec 31 '12 at 16:01 — Raf, Dec 31 '12 at 16:01

score 2 · Answer 2 · answered Dec 31 '12 at 13:39

The usage of toupper won't make sense if you have letters from other languages than English A-Z alphabet, e.g. the Germanic ä, ö or ü, and various accented letters in French/Spanish, and of course, if the input is a "germano-latin" language at all, for example Russian. [As pointed out by James, that may require Unicode parsing, which is a whole new ballgame altogether, tho']

Obviously, the first function is also hard-coded to convert the 27 first characters of the input, which is bad coding because the function shouldn't rely on the size of the string - particularly not since "std::string" does have a length in the first place!

thanks for the explanations :) I really appreciate it. – Raf Dec 31 '12 at 16:00 — Raf, Dec 31 '12 at 16:00

score 0 · Answer 3 · answered Dec 31 '12 at 13:23

0

toupper() can handle non-ASCII character
Syntax wise, Upper2() is less error-prone
Not too sure about this, but i think toupper() is slower

answered Dec 31 '12 at 13:23

SekaiCode

1,043
8
8

Why do you think 3? (It is obviously implementation dependent, and I've not done actual measurements in over 20 years, but when I did measure, `toupper` was significantly faster.) – James Kanze Dec 31 '12 at 13:38

lowercase to uppercase in C++

3 Answers3