C++ tolower/toupper char pointer

Question

Do you guys know why the following code crash during the runtime?

char* word;
word = new char[20];
word = "HeLlo"; 
for (auto it = word; it != NULL; it++){        
    *it = (char) tolower(*it);

I'm trying to lowercase a char* (string). I'm using visual studio.

Thanks

you are using auto in the same time using char*.. search for std::string and you gonna have less time debugging and more time to enjoy — Humam Helfawi, Nov 23 '15 at 21:05
Did you overwrite the `word = new char[20];` allocation with `word = "HeLlo";`? If so, you cannot `delete word` later. — Weather Vane, Nov 23 '15 at 21:06
@HumamHelfawi Thanks! just wanted to use char* instead of string. Even when I use `char* it` still get the runtime error. @Weather yes. — mask, Nov 23 '15 at 21:08
You also forgot to free the memory you allocated, which is a very common and serious mistake. — Marian Spanik, Nov 23 '15 at 21:10
You are (re)assigning a pointer to memory to point to a constant literal. The expression `word = "Hello";` **assigns a pointer and does not copy contents.** Use `std::string` instead. — Thomas Matthews, Nov 23 '15 at 21:10
If you want to use a pointer (why?) instead of the `string` class, you need to understand the semantic of pointers. — curiousguy, Nov 23 '15 at 21:22

score 6 · Accepted Answer · edited Nov 24 '15 at 15:59

You cannot compare it to NULL. Instead you should be comparing *it to '\0'. Or better yet, use std::string and never worry about it :-)

In summary, when looping over a C-style string. You should be looping until the character you see is a '\0'. The iterator itself will never be NULL, since it is simply pointing a place in the string. The fact that the iterator has a type which can be compared to NULL is an implementation detail that you shouldn't touch directly.

Additionally, you are trying to write to a string literal. Which is a no-no :-).

EDIT: As noted by @Cheers and hth. - Alf, tolower can break if given negative values. So sadly, we need to add a cast to make sure this won't break if you feed it Latin-1 encoded data or similar.

This should work:

char word[] = "HeLlo";
for (auto it = word; *it != '\0'; ++it) {
    *it = tolower(static_cast<unsigned char>(*it));
}

Better fix the call to `tolower`, since presumably this example is intended to show off generally correct code — Cheers and hth. - Alf, Nov 23 '15 at 21:26
@Cheersandhth.-Alf, fair enough. I'll add something for that :-) — Evan Teran, Nov 23 '15 at 21:30

Barmar · Answer 2 · 2015-11-23T21:08:58.677

4

You're setting word to point to the string literal, but literals are read-only, so this results in undefined behavior when you assign to *it. You need to make a copy of it in the dynamically-allocated memory.

char *word = new char[20];
strcpy(word, "HeLlo");

Also in your loop you should compare *it != '\0'. The end of a string is indicated by the character being the null byte, not the pointer being null.

edited Nov 23 '15 at 21:08

answered Nov 23 '15 at 21:08

Barmar

741,623
53
500
612

Cheers and hth. - Alf · Answer 3 · 2015-11-24T15:57:38.453

Given code (as I'm writing this):

char* word;
word = new char[20];
word = "HeLlo"; 
for (auto it = word; it != NULL; it++){        
    *it = (char) tolower(*it);

This code has Undefined Behavior in 2 distinct ways, and would have UB also in a third way if only the text data was slightly different:

Buffer overrun.
The continuation condition it != NULL will not be false until the pointer it has wrapped around at the end of the address range, if it does.
Modifying read only memory.
The pointer word is set to point to the first char of a string literal, and then the loop iterates over that string and assigns to each char.
Passing possible negative value to tolower.
The char classification functions require a non-negative argument, or else the special value EOF. This works fine with the string "HeLlo" under an assumption of ASCII or unsigned char type. But in general, e.g. with the string "Blåbærsyltetøy", directly passing each char value to tolower will result in negative values being passed; a correct invocation with ch of type char is (char) tolower( (unsigned char)ch ).

Additionally the code has a memory leak, by allocating some memory with new and then just forgetting about it.

A correct way to code the apparent intent:

using Byte = unsigned char;

auto to_lower( char const c )
    -> char
{ return Byte( tolower( Byte( c ) ) ); }

// ...
string word = "Hello";
for( char& ch : word ) { ch = to_lower( ch ); }

score 1 · Answer 4 · answered Nov 23 '15 at 21:19

1

There are already two nice answers on how to solve your issues using null terminated c-strings and poitners. For the sake of completeness, I propose you an approach using c++ strings:

string word;           // instead of char* 
//word = new char[20]; // no longuer needed: strings take care for themseves
word = "HeLlo";        //  no worry about deallocating previous values: strings take care for themselves
for (auto &it : word)  // use of range for, to iterate through all the string elements      
    it = (char) tolower(it);

answered Nov 23 '15 at 21:19

Christophe

68,716
7
72
138

Better fix the call to `tolower`, since presumably this example is intended to show off generally correct code. Also, I would fix the naming. `it` is not an iterator. – Cheers and hth. - Alf Nov 23 '15 at 21:21
1

@Cheersandhth.-Alf well, for the more general case, I'd go for `it = (char)tolower(it, loc);` where loc is a locale – Christophe Nov 23 '15 at 21:51
I wouldn't recommend that. It just adds verbosity, inefficiency and complexity (for making the locale available). In contrast, the C function is simple, reasonably efficient, and locale-aware. – Cheers and hth. - Alf Nov 23 '15 at 22:33
If you're writing non localised english software, your're certainly right. But for european charsets, the mapping upper<->lower isn't quite the same for ascii (not defined) iso8859 (ex: 0xE4 ä 0xD4 Ä) and legacy iso646 (0x84 ä 0x8E Ä). Here the locale overhead is in fact a real benefit. – Christophe Nov 23 '15 at 22:50
The C function **is** locale-aware. It uses the C level locale set with `setlocale`. I was not talking about a general locale overhead, I was talking about the overhead of involving C++ locales. Anyway note that it's not meaningful to draw a difference between Latin-1 (ISO 8859) and Unicode (ISO 646) in this context. Latin-1 is a strict subset of Unicode, it's the first 256 code points. – Cheers and hth. - Alf Nov 24 '15 at 15:49

score 1 · Answer 5 · answered Nov 23 '15 at 22:20

1

Its crashing because you are modifying a string literal.

answered Nov 23 '15 at 22:20

SEB

21
1

score 0 · Answer 6 · answered Nov 03 '17 at 09:29

there is a dedicated functions for this use strupr for making string uppercase and strlwr for making the string lower case.

here is an usage example:

char str[ ] = "make me upper";
printf("%s\n",strupr(str));


char str[ ] = "make me lower";
printf("%s\n",strlwr (str));

C++ tolower/toupper char pointer

6 Answers6