getChar() to remove a word from a paragraph in C

Question

Lets say i have a paragraph, and in that paragraph i want to remove the word bus. I use getchar() to get the input. How do i go about doing this.

int main()
{

 while ((character = getchar()) != EOF) {
    if (character != '\n') {

I suggest that you turn off your computer. Get a piece of paper and pencil and describe **in words** what steps you need to solve the problem. — Code-Apprentice, Apr 02 '18 at 20:11
So what you have right is something that just checks if your character is at the end of the file or if its a new line and since you have to use getChar(), I assume this is user entered. So you should use a for loop to read every single letter that was entered, and make a check for 'b'. Then use a similar implementation for 'u' then a final one for 's'. Read up on how getChar() is used, pretty simple it's just syntax and use a couple for loops and you're good to go — chrisHG, Apr 02 '18 at 20:27

stevendesu · Answer 1 · 2018-04-03T16:01:03.393

First Things First

The problem you're trying to solve (removing a substring from a larger string) is a very common problem. It can help to do some research before asking for answers. Usually when a problem is common enough, there are libraries and tools available to do the work for you.

In this case, the problem you're looking to solve is called string replacement. String replacement is the process of:

Taking a string of text
Scanning for a matched substring
Replacing the match with some other substring (in your case, nothing)

Therefore in pseudo-code what you're looking to do is something like:

string paragraph = "This is a paragraph about a bus. I like the bus. My friend doesn't.";
replace(paragraph, "bus", "");

Note that substring deletion is just a subset of substring replacement. You could just as easily replace "bus" with something like "cow", creating the new string "This is a paragraph about a cow. I like the cow. My friend doesn't."

Googling "C++ string replacement" will teach you a little bit about how to do this

C Strings versus C++ Strings

In C, data lives in what are called primitive types. This basically means every type of data has some direct mapping to RAM. An integer is 4 bytes in RAM. An array of 5 integers is 20 bytes in RAM with every 4th denoting a new integer. A character is a single byte, and a string is an array of characters. Simple, yes?

In C++, the concept of objects and classes became a key part of the language. At a low level these use pointers to abstract away what's happening in RAM. I'll avoid the lesson on pointers and instead give you the beginner definition of objects: objects allow you to combine data and behavior into a single concept. This means that instead of just having an array of characters, you have a "string" - and "strings" have a collection of functions relevant to strings (like concatenating them, or performing string replacement)

This is a problem that can be solved in any Turing complete programming language, but if you're allowed to use C++ then you'll find the code is a lot easier to read and understand.

The problem with `getchar()`

When using getchar() you only get the opportunity to see a single letter at a time. This means we never see the string in its entirety, and so we can't just use a copy and paste string replacement algorithm. If you don't have to use getchar() then I suggest reading in the string some other way so that the entire paragraph can be stored together. Example:

C:

char buff[1024];
fgets(buff, 1024, stdin)

C++:

string paragraph;
cin >> paragraph;

By doing it this way you get the entire string instead of one letter at a time

If you must use `getchar()`

This is still possible even reading one character at a time, but it requires a Finite State Machine. The idea behind a finite state machine is that you keep track of what has already been seen by tracking your current "state"

For example:

State 1:

If this character is a B, go to state 2. Otherwise, stay in this state.

State 2:

If this character is a U, go to state 3. Otherwise, go to state 1.

State 3:

If this character is an S, we've encountered the word bus!
Otherwise, go to state 1

Here's some pseudo-code that I believe will accomplish what you want, but I advise you to consider switching to C++ strings instead of doing this if possible:

int main()
{
    int state = 1;
    char buff[1024];
    int i = 0;
    memset(buff, 0, 1024);
    while ((character = getchar()) != EOF)
    {
        switch(state)
        {
            case 1:
                if (character == 'b')
                {
                    state = 2;
                }
                else
                {
                    buff[i] = character;
                    i++;
                }
                break;
            case 2:
                if (character == 'u')
                {
                    state = 3;
                }
                else
                {
                    // We didn't add the "b" earlier, so add it now
                    buff[i] = 'b';
                    buff[i + 1] = character;
                    i += 2;
                    state = 1; // <-- The bug I mention below... This line was missing
                }
                break;
            case 3:
                if (character == 's')
                {
                    // We found (and ignored) one occurrence of the word "bus"
                    // Now let's start looking for the next one
                    state = 1;
                }
                else
                {
                    // We didn't add the "bu" earlier, so add it now
                    buff[i] = 'b';
                    buff[i + 1] = 'u';
                    buff[i + 2] = character;
                    i += 3;
                    state = 1; // <-- The bug I mention below... This line was missing
                }
                break;
        }
    }
    // Let's print our buffer now
    printf("%s", buff);
    return 0;
}

Edit

I was just re-reading my answer and found a bug in my code. This is a good example for why you should try to use higher-level abstractions rather than working on one character at a time. The lower-level your code, the easier it is to make simple mistakes - because what we're doing in code doesn't map directly to how we think about the problem. At the proper level of abstraction, an English definition of the problem and the code should look almost identical.

Specifically, my bug was that I wasn't setting the state back to 1 if a letter did not match the expected one. This bug was easy to miss because humans don't generally think in terms of finite state machines. We just say "find all instances of the word bus".

As an aside, here's the code if you're allowed to use C++ strings and don't need to use the getchar() function:

int main()
{
    string paragraph;
    cin >> paragraph;
    paragraph.replace("bus", "");
    cout << paragraph << endl;
    return 0;
}

getChar() to remove a word from a paragraph in C

1 Answers1

First Things First

C Strings versus C++ Strings

The problem with getchar()

If you must use getchar()

Edit

The problem with `getchar()`

If you must use `getchar()`