-3
#include <stdio.h>
char *cap_string(char *str);

int main(void) {
    char str[] = "Expect the best. Prepare for the worst. Capitalize on what comes.\nhello world! hello-world 0123456hello world\thello world.hello world\n";
    char *ptr;
    ptr = cap_string(str);
    printf("%s", ptr);
    printf("%s", str);
    return (0);
}

char *cap_string(char *str)
{
    int index = 0;

    while (str[index])
    {
        while (!(str[index] >= 'a' && str[index] <= 'z'))
            index++;

        if (str[index - 1] == ' ' ||
            str[index - 1] == '\t' ||
            str[index - 1] == '\n' ||
            str[index - 1] == ',' ||
            str[index - 1] == ';' ||
            str[index - 1] == '.' ||
            str[index - 1] == '!' ||
            str[index - 1] == '?' ||
            str[index - 1] == '"' ||
            str[index - 1] == '(' ||
            str[index - 1] == ')' ||
            str[index - 1] == '{' ||
            str[index - 1] == '}' ||
            index == 0)
            str[index] -= 32;

        index++;
    }

    return (str);
}

I want to understand what this loop is doing, I just cant follow

while (!(str[index] >= 'a' && str[index] <= 'z')){
        index++;
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • What exactly do you not understand in `while (!(str[index] >= 'a' && str[index] <= 'z')) index++;` ? Is it the `!`, the `&&`, the `while` or...? It's basic C knowledge that is explained in each and every C beginner's C text book. – Jabberwocky May 05 '23 at 08:48
  • I don't think this does what you think. Use `islower()` to simplify. (Especially '\0' fails `islower` so it's possible to have undefined behaviour going off the string.) – Neil May 05 '23 at 09:02

3 Answers3

2

The very first thing to notice is that this code entirely relies on ASCII input or compatible, for two reasons:

  1. It assumes that all upper case letters as well as all lower case letters succeed one another.
  2. It assumes a distance of 32 between the corresponding upper and lower case letters.

A counter-example is the (in-?)famous EBCDIC coding system – admitted, not of much relevance nowadays any more...

ASCII has the two formentioned characteristics – if you peek into an ASCII table you'll notice that e.g. A is encoded by value 65 while a is encoded by 97 - difference is 32. So by subtracting 32 from the value for a you reach the value for A – and the other letters accordingly...

The concrete loop in question now checks if a letter is out of the range of [97;122] (mathematical notation: not 97 ≤ letter ≤ 122) and if so just increments the index, i.e. skips letters not being lower case.

Still note that this programme shows undefined behaviour if the first letter is a lower case one!

It does indeed check if the index is 0 – but too late! When reaching this test str[-1] already has been accessed, so array access out of bounds, thus UB. You need first to test if index is 0, then you can check if the preceding character matches one of the separators.

Additionally you have a problem at the very end of the string if it doesn't terminate in a lower case letter; the inner while loop then will continue iterating beyond the string until it finds a value that accidentally falls into that range and modify it – albeit being somewhere totally else, possibly doing something harmful!

A safe variant requires just a minor modification:

while(str[index])
{
    if(str[index] >= 'a' && str[index] <= 'z')
    {
       if(index == 0 || /* all the other tests */)
       {
           str[index] -= 32;
       }
    }
    ++index;
}

Though I'd rather prefer a for loop instead:

for(size_t index = 0; str[index] != 0; ++index)
{
    if(...) {...}
}

A more generic solution (not relying on ASCII) uses islower and toupper functions instead, and you might want to test with e.g. isspace and ispunct functions or alternatively !isalnum to detect if you need to change to upper case; such code could look like (here implementing pointer arithmetics for further convenience):

for(char* p = ptr; *p; ++p)
{
    if(islower((unsigned char)*p)
    {
        if(p == ptr || !isalnum((unsigned char)p[-1])
        {
            *p = toupper((unsigned char)*p);
        }
   }
}

Note that the casts to unsigned char are necessary to prevent characters in the extended ASCII range (> 127) are interpreted as negative values if char is actually signed.

Note, too, that above code now capitalises after - and _ as well, which originally hasn't been done, you might want to exclude these explicitly, if need be...

If you want to retain an explicit list of separators you still can have the test simpler as

if(p == ptr || strchr(" \t\n.,;[...]", p[-1])) { ... }

(as testing for equality anyway and negative values are not you don't need the cast to unsigned here...).

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
0

defines a function cap_string which capitalizes the first letter of each word in a given string, where words are defined as sequences of characters separated by spaces, tabs, newlines, commas, semicolons, periods, exclamation marks, question marks, double quotes, parentheses, or curly braces. The main function defines a string str, passes it to cap_string, and then prints the modified string and the original string to the console.

however this loop is checking if there's a spaces, tabs, newlines, commas, semicolons, periods, exclamation marks, question marks, double quotes, parentheses, or curly braces in the main string and devide it and capitalize the first letter in it

0

For starters the function is in fact entirely incorrect because at least it can invoke undefined behavior apart from logical errors.

In this while loop

while (!(str[index] >= 'a' && str[index] <= 'z'))
    index++;

there is no check whether the end of the string (that is whether the terminating zero character '\0') is encountered. So this while loop can read memory beyond the string.

Another problem is that in the if statement

    if (str[index - 1] == ' ' ||
        str[index - 1] == '\t' ||
        str[index - 1] == '\n' ||
        str[index - 1] == ',' ||
        str[index - 1] == ';' ||
        str[index - 1] == '.' ||
        str[index - 1] == '!' ||
        str[index - 1] == '?' ||
        str[index - 1] == '"' ||
        str[index - 1] == '(' ||
        str[index - 1] == ')' ||
        str[index - 1] == '{' ||
        str[index - 1] == '}' ||
        index == 0)

when index is equal to 0 then all the expression before the expression index == 0 again access memory beyond the string using the negative value of the expression index - 1. So at least this condition index == 0 shall be the first condition in the if statement.

And one more problem is that as soon as a lower case letter was found and possibly was changed to upper case letter you need to skip all following letters until not a letter is encountered.

And this statement

str[index] -= 32;

will produce an incorrect result if for example EBCDIC character table is used instead of ASCII character table. It is much better to use standard C function declared in header <ctype.h> instead of manually processing characters in the string.

As for your question then this while loop

while (!(str[index] >= 'a' && str[index] <= 'z')){
    index++;

is designated to skip all characters in the string that are not lower case ,letters in the range ['a', 'z'].

In Fact you need to capitalize a letter when it is either the first letter of the string or it is not preceded by an upper case letter or by a non-letter character. Taking this into account the function can look for example the following way as shown in the demonstrative program below.

#include <stdio.h>
#include <ctype.h>

char * cap_string( char *str )
{

    char *p = str;

    do
    {
        while (*p && !islower( ( unsigned char )*p )) ++p;

        if (*p && ( p == str || !isupper( ( unsigned char )p[-1] ) ) )
        {
            *p = toupper( *p );
        }

        while (isalpha( ( unsigned char )*p ) ) ++p;
    } while ( *p );

    return str;
}

int main( void )
{
    char str[] = "Expect the best. Prepare for the worst. "
                 "Capitalize on what comes.\n"
                 "hello world! hello-world 0123456hello world\t"
                 "hello world.hello world\n";

    puts( cap_string( str ) );
}

The program output is

Expect The Best. Prepare For The Worst. Capitalize On What Comes.
Hello World! Hello-World 0123456Hello World     Hello World.Hello World

The shown function converts letters to upper case if they also encountered after a digit. If you do not want to convert a letter to upper case after a digit then change this if statement

if (*p && ( p == str || !isupper( ( unsigned char )p[-1] ) ))

to this one

if (*p && ( p == str || ( !isupper( ( unsigned char )p[-1] ) && !isdigit( ( unsigned char )p[-1] ) ) ) )
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335