The very first thing to notice is that this code entirely relies on ASCII input or compatible, for two reasons:
- It assumes that all upper case letters as well as all lower case letters succeed one another.
- It assumes a distance of 32 between the corresponding upper and lower case letters.
A counter-example is the (in-?)famous EBCDIC coding system – admitted, not of much relevance nowadays any more...
ASCII has the two formentioned characteristics – if you peek into an ASCII table you'll notice that e.g. A
is encoded by value 65 while a
is encoded by 97
- difference is 32. So by subtracting 32 from the value for a
you reach the value for A
– and the other letters accordingly...
The concrete loop in question now checks if a letter is out of the range of [97;122] (mathematical notation: not 97 ≤ letter ≤ 122
) and if so just increments the index, i.e. skips letters not being lower case.
Still note that this programme shows undefined behaviour if the first letter is a lower case one!
It does indeed check if the index is 0 – but too late! When reaching this test str[-1]
already has been accessed, so array access out of bounds, thus UB. You need first to test if index
is 0, then you can check if the preceding character matches one of the separators.
Additionally you have a problem at the very end of the string if it doesn't terminate in a lower case letter; the inner while loop then will continue iterating beyond the string until it finds a value that accidentally falls into that range and modify it – albeit being somewhere totally else, possibly doing something harmful!
A safe variant requires just a minor modification:
while(str[index])
{
if(str[index] >= 'a' && str[index] <= 'z')
{
if(index == 0 || /* all the other tests */)
{
str[index] -= 32;
}
}
++index;
}
Though I'd rather prefer a for loop instead:
for(size_t index = 0; str[index] != 0; ++index)
{
if(...) {...}
}
A more generic solution (not relying on ASCII) uses islower
and toupper
functions instead, and you might want to test with e.g. isspace
and ispunct
functions or alternatively !isalnum
to detect if you need to change to upper case; such code could look like (here implementing pointer arithmetics for further convenience):
for(char* p = ptr; *p; ++p)
{
if(islower((unsigned char)*p)
{
if(p == ptr || !isalnum((unsigned char)p[-1])
{
*p = toupper((unsigned char)*p);
}
}
}
Note that the casts to unsigned char
are necessary to prevent characters in the extended ASCII range (> 127) are interpreted as negative values if char
is actually signed.
Note, too, that above code now capitalises after -
and _
as well, which originally hasn't been done, you might want to exclude these explicitly, if need be...
If you want to retain an explicit list of separators you still can have the test simpler as
if(p == ptr || strchr(" \t\n.,;[...]", p[-1])) { ... }
(as testing for equality anyway and negative values are not you don't need the cast to unsigned here...).